r.ShaderCompiler.DistributedMinBatchSize
r.ShaderCompiler.DistributedMinBatchSize
#Overview
name: r.ShaderCompiler.DistributedMinBatchSize
This variable is created as a Console Variable (cvar).
- type:
Var
- help:
Minimum number of shaders to compile with a distributed controller.\nSmaller number of shaders will compile locally.
It is referenced in 40
C++ source files.
#Summary
#Usage in the C++ source code
The purpose of r.ShaderCompiler.DistributedMinBatchSize is to set the minimum number of shaders to compile with a distributed shader compiler controller. Smaller numbers of shaders will be compiled locally instead of using the distributed compilation system.
Key points about this setting variable:
-
It is used by the shader compilation system, specifically for distributed compilation.
-
The Unreal Engine shader compiler subsystem relies on this variable to determine when to use distributed compilation vs local compilation.
-
The value is set via console variable, with a default provided in the code.
-
It interacts with the MinBatchSize variable, which shares the same value.
-
Developers should be aware that setting this too low could result in inefficient use of the distributed compilation system for very small shader batches.
-
Best practices would be to tune this value based on the specific project and build environment to balance compilation speed and resource utilization.
The associated MinBatchSize variable is used in various parts of the engine, including:
- Nanite displaced mesh compilation
- USD asset importing
- Parallel processing of various asset types (animations, static meshes, etc)
- Physics simulation parallelization
- Plugin discovery
When using MinBatchSize, developers should consider:
- The tradeoff between parallelism granularity and overhead
- Adjusting the value based on the specific workload characteristics
- That different subsystems may have different optimal batch sizes
In general, this variable allows fine-tuning of parallel processing behavior across multiple engine subsystems. Careful adjustment can potentially improve performance, but should be done with testing to verify the impact.
#References in C++ code
#Callsites
This variable is referenced in the following C++ source code:
#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/ShaderCompiler/ShaderCompilerDistributed.cpp:17
Scope (from outer to inner):
file
namespace DistributedShaderCompilerVariables
Source code excerpt:
FAutoConsoleVariableRef CVarDistributedMinBatchSize(
TEXT("r.ShaderCompiler.DistributedMinBatchSize"),
MinBatchSize,
TEXT("Minimum number of shaders to compile with a distributed controller.\n")
TEXT("Smaller number of shaders will compile locally."),
ECVF_Default);
static int32 GDistributedControllerTimeout = 15 * 60;
#Associated Variable and Callsites
This variable is associated with another variable named MinBatchSize
. They share the same value. See the following C++ source code.
#Loc: <Workspace>/Engine/Plugins/Experimental/NaniteDisplacedMesh/Source/NaniteDisplacedMesh/Private/NaniteDisplacedMeshCompiler.cpp:387
Scope (from outer to inner):
file
function void FNaniteDisplacedMeshCompilingManager::ProcessNaniteDisplacedMeshes
Source code excerpt:
void FNaniteDisplacedMeshCompilingManager::Reschedule()
{
// TODO Prioritize nanite displaced mesh that are nearest to the viewport
}
void FNaniteDisplacedMeshCompilingManager::ProcessNaniteDisplacedMeshes(bool bLimitExecutionTime, int32 MinBatchSize)
{
using namespace NaniteDisplacedMeshCompilingManagerImpl;
TRACE_CPUPROFILER_EVENT_SCOPE(FNaniteDisplacedMeshCompilingManager::ProcessNaniteDisplacedMeshes);
const int32 NumRemainingMeshes = GetNumRemainingAssets();
// Spread out the load over multiple frames but if too many meshes, convergence is more important than frame time
const int32 MaxMeshUpdatesPerFrame = bLimitExecutionTime ? FMath::Max(64, NumRemainingMeshes / 10) : INT32_MAX;
FObjectCacheContextScope ObjectCacheScope;
if (NumRemainingMeshes && NumRemainingMeshes >= MinBatchSize)
{
TSet<UNaniteDisplacedMesh*> NaniteDisplacedMeshesToProcess;
for (TWeakObjectPtr<UNaniteDisplacedMesh>& NaniteDisplacedMesh : RegisteredNaniteDisplacedMesh)
{
if (NaniteDisplacedMesh.IsValid())
{
NaniteDisplacedMeshesToProcess.Add(NaniteDisplacedMesh.Get());
}
}
{
#Loc: <Workspace>/Engine/Plugins/Experimental/NaniteDisplacedMesh/Source/NaniteDisplacedMesh/Private/NaniteDisplacedMeshCompiler.h:77
Scope (from outer to inner):
file
class class FNaniteDisplacedMeshCompilingManager : public IAssetCompilingManager, public FGCObject
Source code excerpt:
TUniquePtr<FAsyncCompilationNotification> Notification;
void FinishCompilationsForGame();
void Reschedule();
void ProcessNaniteDisplacedMeshes(bool bLimitExecutionTime, int32 MinBatchSize = 1);
void UpdateCompilationNotification();
void PostCompilation(TArrayView<UNaniteDisplacedMesh* const> InNaniteDisplacedMeshes);
void PostCompilation(UNaniteDisplacedMesh* InNaniteDisplacedMesh);
void OnPostReachabilityAnalysis();
FDelegateHandle PostReachabilityAnalysisHandle;
};
#endif // WITH_EDITOR
#Loc: <Workspace>/Engine/Plugins/Importers/USDImporter/Source/USDSchemas/Private/USDInfoCache.cpp:669
Scope (from outer to inner):
file
namespace UE::USDInfoCacheImpl::Private
function void RecursivePropagateVertexAndMaterialSlotCounts
Source code excerpt:
ChildSubtreeVertexCounts.SetNumUninitialized(NumChildren);
TArray<TArray<UsdUtils::FUsdPrimMaterialSlot>> ChildSubtreeMaterialSlots;
ChildSubtreeMaterialSlots.SetNum(NumChildren);
const int32 MinBatchSize = 1;
ParallelFor(
TEXT("RecursivePropagateVertexAndMaterialSlotCounts"),
Prims.Num(),
MinBatchSize,
[&](int32 Index)
{
RecursivePropagateVertexAndMaterialSlotCounts(
Prims[Index],
Context,
MaterialPurposeToken,
Impl,
Registry,
InOutSubtreeToMaterialSlots,
InOutPointInstancerPaths,
ChildSubtreeVertexCounts[Index],
#Loc: <Workspace>/Engine/Plugins/Importers/USDImporter/Source/USDSchemas/Private/USDInfoCache.cpp:1010
Scope (from outer to inner):
file
namespace UE::USDInfoCacheImpl::Private
function void RecursiveQueryCollapsesChildren
Source code excerpt:
for (pxr::UsdPrim Child : PrimChildren)
{
Prims.Emplace(Child);
}
const int32 MinBatchSize = 1;
ParallelFor(
TEXT("RecursiveQueryCollapsesChildren"),
Prims.Num(),
MinBatchSize,
[&](int32 Index)
{
RecursiveQueryCollapsesChildren(Prims[Index], Context, Impl, Registry, *AssetCollapsedRootOverride, *ComponentCollapsedRootOverride);
}
);
{
FWriteScopeLock ScopeLock(Impl.InfoMapLock);
UE::UsdInfoCache::Private::FUsdPrimInfo& Info = Impl.InfoMap.FindOrAdd(UE::FSdfPath{UsdPrimPath});
#Loc: <Workspace>/Engine/Plugins/Importers/USDImporter/Source/USDSchemas/Private/USDInfoCache.cpp:1243
Scope (from outer to inner):
file
namespace UE::USDInfoCacheImpl::Private
function void RecursiveCheckForGeometryCache
Source code excerpt:
Depths.SetNum(Prims.Num());
TArray<EGeometryCachePrimState> States;
States.SetNum(Prims.Num());
const int32 MinBatchSize = 1;
ParallelFor(
TEXT("RecursiveCheckForGeometryCache"),
Prims.Num(),
MinBatchSize,
[&Prims, &Context, &Impl, bIsInsideSkelRoot, &Depths, &States](int32 Index)
{
RecursiveCheckForGeometryCache(
Prims[Index],
Context,
Impl,
bIsInsideSkelRoot || Prims[Index].IsA<pxr::UsdSkelRoot>(),
Depths[Index],
States[Index]
);
}
#Loc: <Workspace>/Engine/Plugins/Importers/USDImporter/Source/USDUtilities/Private/USDSkeletalDataConversion.cpp:691
Scope (from outer to inner):
file
namespace UsdToUnrealImpl
function void CreateMorphTargets
Source code excerpt:
}
}
}
}
const int32 MinBatchSize = 1;
ParallelFor(
TEXT("CreateMorphTarget"),
MorphTargetJobs.Num(),
MinBatchSize,
[&MorphTargetJobs, &OrigIndexToBuiltIndicesPerLOD, &TempMeshBundlesPerLOD, ImportedResource](int32 Index)
{
TRACE_CPUPROFILER_EVENT_SCOPE(USDSkeletalDataConversion::CreateMorphTargetJob);
FMorphTargetJob& Job = MorphTargetJobs[Index];
if (!Job.BlendShape || !Job.MorphTarget)
{
return;
}
for (int32 LODIndex : Job.BlendShape->LODIndicesThatUseThis)
#Loc: <Workspace>/Engine/Source/Runtime/Core/Private/GenericPlatform/GenericPlatformFile.cpp:625
Scope (from outer to inner):
file
function bool IPlatformFile::IterateDirectoryRecursively
Source code excerpt:
};
TArray<FString> DirectoriesToVisit;
DirectoriesToVisit.Add(Directory);
constexpr int32 MinBatchSize = 1;
const EParallelForFlags ParallelForFlags = FTaskGraphInterface::IsRunning() && Visitor.IsThreadSafe()
? EParallelForFlags::Unbalanced : EParallelForFlags::ForceSingleThread;
std::atomic<bool> bResult{true};
TArray<TArray<FString>> DirectoriesToVisitNext;
while (bResult && DirectoriesToVisit.Num() > 0)
{
ParallelForWithTaskContext(TEXT("IterateDirectoryRecursively.PF"),
DirectoriesToVisitNext,
DirectoriesToVisit.Num(),
MinBatchSize,
[this, &Visitor, &DirectoriesToVisit, &bResult](TArray<FString>& Directories, int32 Index)
{
FRecurse Recurse(Visitor, Directories);
if (bResult.load(std::memory_order_relaxed) && !IterateDirectory(*DirectoriesToVisit[Index], Recurse))
{
bResult.store(false, std::memory_order_relaxed);
}
},
ParallelForFlags);
DirectoriesToVisit.Reset(Algo::TransformAccumulate(DirectoriesToVisitNext, &TArray<FString>::Num, 0));
for (TArray<FString>& Directories : DirectoriesToVisitNext)
#Loc: <Workspace>/Engine/Source/Runtime/Core/Private/GenericPlatform/GenericPlatformFile.cpp:635
Scope (from outer to inner):
file
function bool IPlatformFile::IterateDirectoryRecursively
Source code excerpt:
DirectoriesToVisitNext,
DirectoriesToVisit.Num(),
MinBatchSize,
[this, &Visitor, &DirectoriesToVisit, &bResult](TArray<FString>& Directories, int32 Index)
{
FRecurse Recurse(Visitor, Directories);
if (bResult.load(std::memory_order_relaxed) && !IterateDirectory(*DirectoriesToVisit[Index], Recurse))
{
bResult.store(false, std::memory_order_relaxed);
#Loc: <Workspace>/Engine/Source/Runtime/Core/Public/Async/ParallelFor.h:77
Scope (from outer to inner):
file
namespace ParallelForImpl
function inline int32 GetNumberOfThreadTasks
Source code excerpt:
inline void CallBody(const FunctionType& Body, const TArrayView<TYPE_OF_NULLPTR>&, int32, int32 Index)
{
Body(Index);
}
inline int32 GetNumberOfThreadTasks(int32 Num, int32 MinBatchSize, EParallelForFlags Flags)
{
int32 NumThreadTasks = 0;
const bool bIsMultithread = FApp::ShouldUseThreadingForPerformance() || FForkProcessHelper::IsForkedMultithreadInstance();
if (Num > 1 && (Flags & EParallelForFlags::ForceSingleThread) == EParallelForFlags::None && bIsMultithread)
{
NumThreadTasks = FMath::Min(int32(LowLevelTasks::FScheduler::Get().GetNumWorkers()), (Num + (MinBatchSize/2))/MinBatchSize);
}
if (!LowLevelTasks::FScheduler::Get().IsWorkerThread())
{
NumThreadTasks++; //named threads help with the work
}
// don't go wider than number of cores
NumThreadTasks = FMath::Min(NumThreadTasks, FPlatformMisc::NumberOfCoresIncludingHyperthreads());
return FMath::Max(NumThreadTasks, 1);
#Loc: <Workspace>/Engine/Source/Runtime/Core/Public/Async/ParallelFor.h:101
Scope (from outer to inner):
file
namespace ParallelForImpl
Source code excerpt:
/**
* General purpose parallel for that uses the taskgraph
* @param DebugName; Debugname and Profiling TraceTag
* @param Num; number of calls of Body; Body(0), Body(1)....Body(Num - 1)
* @param MinBatchSize; Minimum size a Batch should have
* @param Body; Function to call from multiple threads
* @param CurrentThreadWorkToDoBeforeHelping; The work is performed on the main thread before it starts helping with the ParallelFor proper
* @param Flags; Used to customize the behavior of the ParallelFor if needed.
* @param Contexts; Optional per thread contexts to accumulate data concurrently.
* Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
**/
template<typename BodyType, typename PreWorkType, typename ContextType>
inline void ParallelForInternal(const TCHAR* DebugName, int32 Num, int32 MinBatchSize, BodyType Body, PreWorkType CurrentThreadWorkToDoBeforeHelping, EParallelForFlags Flags, const TArrayView<ContextType>& Contexts)
{
if (Num == 0)
{
// Contract is that prework should always be called even when number of tasks is 0.
// We omit the trace scope here to avoid noise when the prework is empty since this amounts to just calling a function anyway with nothing specific to parallelfor itself.
CurrentThreadWorkToDoBeforeHelping();
return;
}
SCOPE_CYCLE_COUNTER(STAT_ParallelFor);
TRACE_CPUPROFILER_EVENT_SCOPE(ParallelFor);
check(Num >= 0);
int32 NumWorkers = GetNumberOfThreadTasks(Num, MinBatchSize, Flags);
if (!Contexts.IsEmpty())
{
// Use at most as many workers as there are contexts when task contexts are used.
NumWorkers = FMath::Min(NumWorkers, Contexts.Num());
}
//single threaded mode
if (NumWorkers <= 1)
{
// do the prework
#Loc: <Workspace>/Engine/Source/Runtime/Core/Public/Async/ParallelFor.h:123
Scope (from outer to inner):
file
namespace ParallelForImpl
function inline void ParallelForInternal
Source code excerpt:
check(Num >= 0);
int32 NumWorkers = GetNumberOfThreadTasks(Num, MinBatchSize, Flags);
if (!Contexts.IsEmpty())
{
// Use at most as many workers as there are contexts when task contexts are used.
NumWorkers = FMath::Min(NumWorkers, Contexts.Num());
}
#Loc: <Workspace>/Engine/Source/Runtime/Core/Public/Async/ParallelFor.h:482
Scope: file
Source code excerpt:
/**
* General purpose parallel for that uses the taskgraph
* @param DebugName; ProfilingScope and Debugname
* @param Num; number of calls of Body; Body(0), Body(1)....Body(Num - 1)
* @param MinBatchSize; Minimum Size of a Batch (will only launch DivUp(Num, MinBatchSize) Workers
* @param Body; Function to call from multiple threads
* @param bForceSingleThread; Mostly used for testing, if true, run single threaded instead.
* Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
**/
template<typename FunctionType>
inline void ParallelForTemplate(const TCHAR* DebugName, int32 Num, int32 MinBatchSize, const FunctionType& Body, EParallelForFlags Flags = EParallelForFlags::None)
{
ParallelForImpl::ParallelForInternal(DebugName, Num, MinBatchSize, Body, [](){}, Flags, TArrayView<TYPE_OF_NULLPTR>());
}
/**
* General purpose parallel for that uses the taskgraph for unbalanced tasks
* Offers better work distribution among threads at the cost of a little bit more synchronization.
* This should be used for tasks with highly variable computational time.
*
* @param Num; number of calls of Body; Body(0), Body(1)....Body(Num - 1)
* @param Body; Function to call from multiple threads
* @param Flags; Used to customize the behavior of the ParallelFor if needed.
* Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
#Loc: <Workspace>/Engine/Source/Runtime/Core/Public/Async/ParallelFor.h:515
Scope: file
Source code excerpt:
* Offers better work distribution among threads at the cost of a little bit more synchronization.
* This should be used for tasks with highly variable computational time.
*
* @param DebugName; ProfilingScope and Debugname
* @param Num; number of calls of Body; Body(0), Body(1)....Body(Num - 1)
* @param MinBatchSize; Minimum Size of a Batch (will only launch DivUp(Num, MinBatchSize) Workers
* @param Body; Function to call from multiple threads
* @param Flags; Used to customize the behavior of the ParallelFor if needed.
* Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
**/
inline void ParallelFor(const TCHAR* DebugName, int32 Num, int32 MinBatchSize, TFunctionRef<void(int32)> Body, EParallelForFlags Flags = EParallelForFlags::None)
{
ParallelForImpl::ParallelForInternal(DebugName, Num, MinBatchSize, Body, [](){}, Flags, TArrayView<TYPE_OF_NULLPTR>());
}
/**
* General purpose parallel for that uses the taskgraph
* @param Num; number of calls of Body; Body(0), Body(1)....Body(Num - 1)
* @param Body; Function to call from multiple threads
* @param CurrentThreadWorkToDoBeforeHelping; The work is performed on the main thread before it starts helping with the ParallelFor proper
* @param bForceSingleThread; Mostly used for testing, if true, run single threaded instead.
* Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
**/
inline void ParallelForWithPreWork(int32 Num, TFunctionRef<void(int32)> Body, TFunctionRef<void()> CurrentThreadWorkToDoBeforeHelping, bool bForceSingleThread, bool bPumpRenderingThread = false)
{
ParallelForImpl::ParallelForInternal(TEXT("ParallelFor Task"), Num, 1, Body, CurrentThreadWorkToDoBeforeHelping,
(bForceSingleThread ? EParallelForFlags::ForceSingleThread : EParallelForFlags::None) |
(bPumpRenderingThread ? EParallelForFlags::PumpRenderingThread : EParallelForFlags::None), TArrayView<TYPE_OF_NULLPTR>());
#Loc: <Workspace>/Engine/Source/Runtime/Core/Public/Async/ParallelFor.h:557
Scope: file
Source code excerpt:
* @param DebugName; ProfilingScope and Debugname
* @param Num; number of calls of Body; Body(0), Body(1)....Body(Num - 1)
* @param MinBatchSize; Minimum Size of a Batch (will only launch DivUp(Num, MinBatchSize) Workers
* @param Body; Function to call from multiple threads
* @param CurrentThreadWorkToDoBeforeHelping; The work is performed on the main thread before it starts helping with the ParallelFor proper
* @param Flags; Used to customize the behavior of the ParallelFor if needed.
* Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
**/
inline void ParallelForWithPreWork(const TCHAR* DebugName, int32 Num, int32 MinBatchSize, TFunctionRef<void(int32)> Body, TFunctionRef<void()> CurrentThreadWorkToDoBeforeHelping, EParallelForFlags Flags = EParallelForFlags::None)
{
ParallelForImpl::ParallelForInternal(DebugName, Num, MinBatchSize, Body, CurrentThreadWorkToDoBeforeHelping, Flags, TArrayView<TYPE_OF_NULLPTR>());
}
/**
* General purpose parallel for that uses the taskgraph
* @param DebugName; ProfilingScope and DebugName
* @param OutContexts; Array that will hold the user-defined, task-level context objects (allocated per parallel task)
* @param Num; number of calls of Body; Body(0), Body(1), ..., Body(Num - 1)
* @param MinBatchSize; Minimum Size of a Batch (will only launch DivUp(Num, MinBatchSize) Workers
* @param ContextConstructor; Function to call to initialize each task context allocated for the operation
* @param Body; Function to call from multiple threads
* @param CurrentThreadWorkToDoBeforeHelping; The work is performed on the main thread before it starts helping with the ParallelFor proper
* @param Flags; Used to customize the behavior of the ParallelFor if needed.
* Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
*/
template <typename ContextType, typename ContextAllocatorType, typename ContextConstructorType, typename BodyType, typename PreWorkType>
inline void ParallelForWithPreWorkWithTaskContext(
const TCHAR* DebugName,
TArray<ContextType, ContextAllocatorType>& OutContexts,
int32 Num,
int32 MinBatchSize,
ContextConstructorType&& ContextConstructor,
BodyType&& Body,
PreWorkType&& CurrentThreadWorkToDoBeforeHelping,
EParallelForFlags Flags = EParallelForFlags::None)
{
if (Num > 0)
{
const int32 NumContexts = ParallelForImpl::GetNumberOfThreadTasks(Num, MinBatchSize, Flags);
OutContexts.Reset(NumContexts);
for (int32 ContextIndex = 0; ContextIndex < NumContexts; ++ContextIndex)
{
OutContexts.Emplace(ContextConstructor(ContextIndex, NumContexts));
}
ParallelForImpl::ParallelForInternal(DebugName, Num, MinBatchSize, Forward<BodyType>(Body), Forward<PreWorkType>(CurrentThreadWorkToDoBeforeHelping), Flags, TArrayView<ContextType>(OutContexts));
}
}
/**
* General purpose parallel for that uses the taskgraph
* @param DebugName; ProfilingScope and DebugName
* @param OutContexts; Array that will hold the user-defined, task-level context objects (allocated per parallel task)
* @param Num; number of calls of Body; Body(0), Body(1), ..., Body(Num - 1)
* @param MinBatchSize; Minimum Size of a Batch (will only launch DivUp(Num, MinBatchSize) Workers
* @param Body; Function to call from multiple threads
* @param CurrentThreadWorkToDoBeforeHelping; The work is performed on the main thread before it starts helping with the ParallelFor proper
* @param Flags; Used to customize the behavior of the ParallelFor if needed.
* Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
*/
template <typename ContextType, typename ContextAllocatorType, typename BodyType, typename PreWorkType>
inline void ParallelForWithPreWorkWithTaskContext(
const TCHAR* DebugName,
TArray<ContextType, ContextAllocatorType>& OutContexts,
int32 Num,
int32 MinBatchSize,
BodyType&& Body,
PreWorkType&& CurrentThreadWorkToDoBeforeHelping,
EParallelForFlags Flags = EParallelForFlags::None)
{
if (Num > 0)
{
const int32 NumContexts = ParallelForImpl::GetNumberOfThreadTasks(Num, MinBatchSize, Flags);
OutContexts.Reset();
OutContexts.AddDefaulted(NumContexts);
ParallelForImpl::ParallelForInternal(DebugName, Num, MinBatchSize, Forward<BodyType>(Body), Forward<PreWorkType>(CurrentThreadWorkToDoBeforeHelping), Flags, TArrayView<ContextType>(OutContexts));
}
}
/**
* General purpose parallel for that uses the taskgraph
* @param DebugName; ProfilingScope and DebugName
* @param Contexts; User-privided array of user-defined task-level context objects
* @param Num; number of calls of Body; Body(0), Body(1), ..., Body(Num - 1)
* @param MinBatchSize; Minimum Size of a Batch (will only launch DivUp(Num, MinBatchSize) Workers
* @param Body; Function to call from multiple threads
* @param CurrentThreadWorkToDoBeforeHelping; The work is performed on the main thread before it starts helping with the ParallelFor proper
* @param Flags; Used to customize the behavior of the ParallelFor if needed.
* Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
*/
template <typename ContextType, typename BodyType, typename PreWorkType>
inline void ParallelForWithPreWorkWithExistingTaskContext(
const TCHAR* DebugName,
TArrayView<ContextType> Contexts,
int32 Num,
int32 MinBatchSize,
BodyType&& Body,
PreWorkType&& CurrentThreadWorkToDoBeforeHelping,
EParallelForFlags Flags = EParallelForFlags::None)
{
ParallelForImpl::ParallelForInternal(DebugName, Num, MinBatchSize, Forward<BodyType>(Body), Forward<PreWorkType>(CurrentThreadWorkToDoBeforeHelping), Flags, Contexts);
}
/**
* General purpose parallel for that uses the taskgraph. This variant constructs for the caller a user-defined context
* object for each task that may get spawned to do work, and passes it on to the loop body to give it a task-local
* "workspace" that can be mutated without need for synchronization primitives. For this variant, the user provides a
#Loc: <Workspace>/Engine/Source/Runtime/Core/Public/Async/ParallelFor.h:734
Scope: file
Source code excerpt:
* @param DebugName; ProfilingScope and Debugname
* @param Num; number of calls of Body; Body(0), Body(1)....Body(Num - 1)
* @param MinBatchSize; Minimum Size of a Batch (will only launch DivUp(Num, MinBatchSize) Workers
* @param ContextConstructor; Function to call to initialize each task context allocated for the operation
* @param Body; Function to call from multiple threads
* @param Flags; Used to customize the behavior of the ParallelFor if needed.
* Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
**/
template <typename ContextType, typename ContextAllocatorType, typename ContextConstructorType, typename FunctionType>
inline void ParallelForWithTaskContext(const TCHAR* DebugName, TArray<ContextType, ContextAllocatorType>& OutContexts, int32 Num, int32 MinBatchSize, const ContextConstructorType& ContextConstructor, const FunctionType& Body, EParallelForFlags Flags = EParallelForFlags::None)
{
if (Num > 0)
{
const int32 NumContexts = ParallelForImpl::GetNumberOfThreadTasks(Num, MinBatchSize, Flags);
OutContexts.Reset();
OutContexts.AddUninitialized(NumContexts);
for (int32 ContextIndex = 0; ContextIndex < NumContexts; ++ContextIndex)
{
new(&OutContexts[ContextIndex]) ContextType(ContextConstructor(ContextIndex, NumContexts));
}
ParallelForImpl::ParallelForInternal(DebugName, Num, MinBatchSize, Body, [](){}, Flags, TArrayView<ContextType>(OutContexts));
}
}
/**
* General purpose parallel for that uses the taskgraph. This variant constructs for the caller a user-defined context
* object for each task that may get spawned to do work, and passes it on to the loop body to give it a task-local
#Loc: <Workspace>/Engine/Source/Runtime/Core/Public/Async/ParallelFor.h:763
Scope: file
Source code excerpt:
* @param DebugName; ProfilingScope and Debugname
* @param Num; number of calls of Body; Body(0), Body(1)....Body(Num - 1)
* @param MinBatchSize; Minimum Size of a Batch (will only launch DivUp(Num, MinBatchSize) Workers
* @param Body; Function to call from multiple threads
* @param Flags; Used to customize the behavior of the ParallelFor if needed.
* Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
**/
template <typename ContextType, typename ContextAllocatorType, typename FunctionType>
inline void ParallelForWithTaskContext(const TCHAR* DebugName, TArray<ContextType, ContextAllocatorType>& OutContexts, int32 Num, int32 MinBatchSize, const FunctionType& Body, EParallelForFlags Flags = EParallelForFlags::None)
{
if (Num > 0)
{
const int32 NumContexts = ParallelForImpl::GetNumberOfThreadTasks(Num, MinBatchSize, Flags);
OutContexts.Reset();
OutContexts.AddDefaulted(NumContexts);
ParallelForImpl::ParallelForInternal(DebugName, Num, MinBatchSize, Body, [](){}, Flags, TArrayView<ContextType>(OutContexts));
}
}
/**
* General purpose parallel for that uses the taskgraph. This variant takes an array of user-defined context
* objects for each task that may get spawned to do work (one task per context at most), and passes them to
#Loc: <Workspace>/Engine/Source/Runtime/Core/Public/Async/ParallelFor.h:786
Scope: file
Source code excerpt:
* @param Contexts; User-privided array of user-defined task-level context objects
* @param Num; number of calls of Body; Body(0), Body(1)....Body(Num - 1)
* @param MinBatchSize; Minimum Size of a Batch (will only launch DivUp(Num, MinBatchSize) Workers
* @param Body; Function to call from multiple threads
* @param Flags; Used to customize the behavior of the ParallelFor if needed.
* Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
**/
template <typename ContextType, typename FunctionType>
inline void ParallelForWithExistingTaskContext(TArrayView<ContextType> Contexts, int32 Num, int32 MinBatchSize, const FunctionType& Body, EParallelForFlags Flags = EParallelForFlags::None)
{
ParallelForImpl::ParallelForInternal(TEXT("ParallelFor Task"), Num, MinBatchSize, Body, [](){}, Flags, Contexts);
}
/**
* General purpose parallel for that uses the taskgraph. This variant takes an array of user-defined context
* objects for each task that may get spawned to do work (one task per context at most), and passes them to
* the loop body to give it a task-local "workspace" that can be mutated without need for synchronization primitives.
* @param DebugName; ProfilingScope and Debugname
* @param Contexts; User-privided array of user-defined task-level context objects
* @param Num; number of calls of Body; Body(0), Body(1)....Body(Num - 1)
* @param MinBatchSize; Minimum Size of a Batch (will only launch DivUp(Num, MinBatchSize) Workers
* @param Body; Function to call from multiple threads
* @param Flags; Used to customize the behavior of the ParallelFor if needed.
* Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
**/
template <typename ContextType, typename FunctionType>
inline void ParallelForWithExistingTaskContext(const TCHAR* DebugName, TArrayView<ContextType> Contexts, int32 Num, int32 MinBatchSize, const FunctionType& Body, EParallelForFlags Flags = EParallelForFlags::None)
{
ParallelForImpl::ParallelForInternal(DebugName, Num, MinBatchSize, Body, [](){}, Flags, Contexts);
}
#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/Animation/AnimationSequenceCompiler.cpp:230
Scope (from outer to inner):
file
namespace UE::Anim
function void FAnimSequenceCompilingManager::ProcessAnimSequences
Source code excerpt:
ProcessAnimSequences(bLimitExecutionTime);
UpdateCompilationNotification();
}
void FAnimSequenceCompilingManager::ProcessAnimSequences(bool bLimitExecutionTime, int32 MinBatchSize)
{
const int32 NumRemaining = GetNumRemainingAssets();
const int32 MaxToProcess = bLimitExecutionTime ? FMath::Max(64, NumRemaining / 10) : INT32_MAX;
FObjectCacheContextScope ObjectCacheScope;
if (NumRemaining && NumRemaining >= MinBatchSize)
{
TSet<UAnimSequence*> SequencesToProcess;
for (TWeakObjectPtr<UAnimSequence>& AnimSequence : RegisteredAnimSequences)
{
if (AnimSequence.IsValid())
{
SequencesToProcess.Add(AnimSequence.Get());
}
}
{
#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/Animation/AnimationSequenceCompiler.h:36
Scope (from outer to inner):
file
namespace UE::Anim
class class FAnimSequenceCompilingManager : public IAssetCompilingManager
Source code excerpt:
void FinishCompilation(TArrayView<UAnimSequence* const> InAnimSequences);
void FinishCompilation(TArrayView<USkeleton* const> InSkeletons);
protected:
virtual void ProcessAsyncTasks(bool bLimitExecutionTime = false) override;
void ProcessAnimSequences(bool bLimitExecutionTime, int32 MinBatchSize = 1);
void PostCompilation(TArrayView<UAnimSequence* const> InAnimSequences);
void ApplyCompilation(UAnimSequence* InAnimSequence);
void UpdateCompilationNotification();
void OnPostReachabilityAnalysis();
private:
friend class FAssetCompilingManager;
TSet<TWeakObjectPtr<UAnimSequence>> RegisteredAnimSequences;
#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/ShaderCompiler/ShaderCompilerDistributed.cpp:6
Scope (from outer to inner):
file
namespace DistributedShaderCompilerVariables
Source code excerpt:
#include "ShaderCompiler.h"
namespace DistributedShaderCompilerVariables
{
//TODO: Remove the XGE doublet
int32 MinBatchSize = 50;
FAutoConsoleVariableRef CVarXGEShaderCompileMinBatchSize(
TEXT("r.XGEShaderCompile.MinBatchSize"),
MinBatchSize,
TEXT("This CVar is deprecated, please use r.ShaderCompiler.DistributedMinBatchSize"),
ECVF_Default);
FAutoConsoleVariableRef CVarDistributedMinBatchSize(
TEXT("r.ShaderCompiler.DistributedMinBatchSize"),
MinBatchSize,
TEXT("Minimum number of shaders to compile with a distributed controller.\n")
TEXT("Smaller number of shaders will compile locally."),
ECVF_Default);
static int32 GDistributedControllerTimeout = 15 * 60;
static FAutoConsoleVariableRef CVarDistributedControllerTimeout(
TEXT("r.ShaderCompiler.DistributedControllerTimeout"),
GDistributedControllerTimeout,
TEXT("Maximum number of seconds we expect to pass between getting distributed controller complete a task (this is used to detect problems with the distribution controllers).")
);
#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/ShaderCompiler/ShaderCompilerDistributed.cpp:172
Scope (from outer to inner):
file
function int32 FShaderCompileDistributedThreadRunnable_Interface::CompilingLoop
Source code excerpt:
// Grab as many jobs from the job queue as we can
const EShaderCompileJobPriority Priority = (EShaderCompileJobPriority)PriorityIndex;
const int32 MinBatchSize = (Priority == EShaderCompileJobPriority::Low) ? 1 : DistributedShaderCompilerVariables::MinBatchSize;
const int32 NumJobs = Manager->AllJobs.GetPendingJobs(EShaderCompilerWorkerType::Distributed, Priority, MinBatchSize, INT32_MAX, PendingJobs);
if (NumJobs > 0)
{
UE_LOG(LogShaderCompilers, Verbose, TEXT("Started %d 'Distributed' shader compile jobs with '%s' priority"),
NumJobs,
ShaderCompileJobPriorityToString((EShaderCompileJobPriority)PriorityIndex));
}
if (PendingJobs.Num() >= DistributedShaderCompilerVariables::MinBatchSize)
{
break;
}
}
}
#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/ShaderCompiler/ShaderCompilerDistributed.cpp:197
Scope (from outer to inner):
file
function int32 FShaderCompileDistributedThreadRunnable_Interface::CompilingLoop
Source code excerpt:
// Increase the batch size when more jobs are queued/in flight.
// Build farm is much more prone to pool oversubscription, so make sure the jobs are submitted in batches of at least MinBatchSize
int MinJobsPerBatch = GIsBuildMachine ? DistributedShaderCompilerVariables::MinBatchSize : 1;
// Just to provide typical numbers: the number of total jobs is usually in tens of thousands at most, oftentimes in low thousands. Thus JobsPerBatch when calculated as a log2 rarely reaches the value of 16,
// and that seems to be a sweet spot: lowering it does not result in faster completion, while increasing the number of jobs per batch slows it down.
const uint32 JobsPerBatch = FMath::Max(MinJobsPerBatch, FMath::FloorToInt(FMath::LogX(2.f, PendingJobs.Num() + NumDispatchedJobs)));
UE_LOG(LogShaderCompilers, Log, TEXT("Current jobs: %d, Batch size: %d, Num Already Dispatched: %d"), PendingJobs.Num(), JobsPerBatch, NumDispatchedJobs);
#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/SkinnedAssetCompiler.cpp:280
Scope (from outer to inner):
file
function void FSkinnedAssetCompilingManager::AddSkinnedAssets
Source code excerpt:
TRACE_CPUPROFILER_EVENT_SCOPE(FSkinnedAssetCompilingManager::AddSkinnedAssets)
check(IsInGameThread());
// Wait until we gather enough mesh to process
// to amortize the cost of scanning components
//ProcessSkinnedAssets(32 /* MinBatchSize */);
for (USkinnedAsset* SkinnedAsset : InSkinnedAssets)
{
check(SkinnedAsset->AsyncTask != nullptr);
RegisteredSkinnedAsset.Emplace(SkinnedAsset);
}
UpdateCompilationNotification();
}
void FSkinnedAssetCompilingManager::FinishCompilation(TArrayView<USkinnedAsset* const> InSkinnedAssets)
#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/SkinnedAssetCompiler.cpp:407
Scope (from outer to inner):
file
function void FSkinnedAssetCompilingManager::ProcessSkinnedAssets
Source code excerpt:
void FSkinnedAssetCompilingManager::Reschedule()
{
}
void FSkinnedAssetCompilingManager::ProcessSkinnedAssets(bool bLimitExecutionTime, int32 MinBatchSize)
{
using namespace SkinnedAssetCompilingManagerImpl;
TRACE_CPUPROFILER_EVENT_SCOPE(FSkinnedAssetCompilingManager::ProcessSkinnedAssets);
const int32 NumRemainingMeshes = GetNumRemainingJobs();
// Spread out the load over multiple frames but if too many meshes, convergence is more important than frame time
const int32 MaxMeshUpdatesPerFrame = bLimitExecutionTime ? FMath::Max(64, NumRemainingMeshes / 10) : INT32_MAX;
FObjectCacheContextScope ObjectCacheScope;
if (NumRemainingMeshes && NumRemainingMeshes >= MinBatchSize)
{
TSet<USkinnedAsset*> SkinnedAssetsToProcess;
for (TWeakObjectPtr<USkinnedAsset>& SkinnedAsset : RegisteredSkinnedAsset)
{
if (SkinnedAsset.IsValid())
{
SkinnedAssetsToProcess.Add(SkinnedAsset.Get());
}
}
{
#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/StaticMeshCompiler.cpp:344
Scope (from outer to inner):
file
function void FStaticMeshCompilingManager::AddStaticMeshes
Source code excerpt:
TRACE_CPUPROFILER_EVENT_SCOPE(FStaticMeshCompilingManager::AddStaticMeshes)
check(IsInGameThread());
// Wait until we gather enough mesh to process
// to amortize the cost of scanning components
//ProcessStaticMeshes(32 /* MinBatchSize */);
for (UStaticMesh* StaticMesh : InStaticMeshes)
{
check(StaticMesh->AsyncTask != nullptr);
RegisteredStaticMesh.Emplace(StaticMesh);
}
TRACE_COUNTER_SET(QueuedStaticMeshCompilation, GetNumRemainingMeshes());
}
void FStaticMeshCompilingManager::FinishCompilation(TArrayView<UStaticMesh* const> InStaticMeshes)
#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/StaticMeshCompiler.cpp:696
Scope (from outer to inner):
file
function void FStaticMeshCompilingManager::ProcessStaticMeshes
Source code excerpt:
}
}
}
}
void FStaticMeshCompilingManager::ProcessStaticMeshes(bool bLimitExecutionTime, int32 MinBatchSize)
{
using namespace StaticMeshCompilingManagerImpl;
LLM_SCOPE(ELLMTag::StaticMesh);
TRACE_CPUPROFILER_EVENT_SCOPE(FStaticMeshCompilingManager::ProcessStaticMeshes);
const int32 NumRemainingMeshes = GetNumRemainingMeshes();
// Spread out the load over multiple frames but if too many meshes, convergence is more important than frame time
const int32 MaxMeshUpdatesPerFrame = bLimitExecutionTime ? FMath::Max(64, NumRemainingMeshes / 10) : INT32_MAX;
FObjectCacheContextScope ObjectCacheScope;
if (NumRemainingMeshes && NumRemainingMeshes >= MinBatchSize)
{
TSet<UStaticMesh*> StaticMeshesToProcess;
for (TWeakObjectPtr<UStaticMesh>& StaticMesh : RegisteredStaticMesh)
{
if (StaticMesh.IsValid())
{
StaticMeshesToProcess.Add(StaticMesh.Get());
}
}
{
#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/StaticMeshCompiler.cpp:706
Scope (from outer to inner):
file
function void FStaticMeshCompilingManager::ProcessStaticMeshes
Source code excerpt:
FObjectCacheContextScope ObjectCacheScope;
if (NumRemainingMeshes && NumRemainingMeshes >= MinBatchSize)
{
TSet<UStaticMesh*> StaticMeshesToProcess;
for (TWeakObjectPtr<UStaticMesh>& StaticMesh : RegisteredStaticMesh)
{
if (StaticMesh.IsValid())
{
#Loc: <Workspace>/Engine/Source/Runtime/Engine/Public/SkinnedAssetCompiler.h:87
Scope (from outer to inner):
file
class class FSkinnedAssetCompilingManager : IAssetCompilingManager
Source code excerpt:
bool bHasShutdown = false;
TSet<TWeakObjectPtr<USkinnedAsset>> RegisteredSkinnedAsset;
TUniquePtr<FAsyncCompilationNotification> Notification;
void FinishCompilationsForGame();
void Reschedule();
void ProcessSkinnedAssets(bool bLimitExecutionTime, int32 MinBatchSize = 1);
void UpdateCompilationNotification();
void PostCompilation(USkinnedAsset* SkinnedAsset);
void PostCompilation(TArrayView<USkinnedAsset* const> InSkinnedAssets);
void OnPostReachabilityAnalysis();
FDelegateHandle PostReachabilityAnalysisHandle;
void OnPreGarbageCollect();
FDelegateHandle PreGarbageCollectHandle;
};
#Loc: <Workspace>/Engine/Source/Runtime/Engine/Public/SoundWaveCompiler.h:89
Scope (from outer to inner):
file
class class FSoundWaveCompilingManager : IAssetCompilingManager
Source code excerpt:
void FinishCompilationForObjects(TArrayView<UObject* const> InObjects) override;
void UpdateCompilationNotification();
void PostCompilation(TArrayView<USoundWave* const> InCompiledSoundWaves);
void PostCompilation(USoundWave* SoundWave);
void ProcessSoundWaves(bool bLimitExecutionTime, int32 MinBatchSize = 1);
TArray<USoundWave*> GatherPendingSoundWaves();
/** Notification for the amount of pending sound wave compilations */
TUniquePtr<FAsyncCompilationNotification> Notification;
};
#endif
#if UE_ENABLE_INCLUDE_ORDER_DEPRECATED_IN_5_4
#include "CoreMinimal.h"
#include "AssetCompilingManager.h"
#Loc: <Workspace>/Engine/Source/Runtime/Engine/Public/StaticMeshCompiler.h:87
Scope (from outer to inner):
file
class class FStaticMeshCompilingManager : IAssetCompilingManager
Source code excerpt:
TSet<TWeakObjectPtr<UStaticMesh>> RegisteredStaticMesh;
TUniquePtr<FAsyncCompilationNotification> Notification;
void FinishCompilationsForGame();
void Reschedule();
void ProcessStaticMeshes(bool bLimitExecutionTime, int32 MinBatchSize = 1);
void UpdateCompilationNotification();
void PostCompilation(TArrayView<UStaticMesh* const> InStaticMeshes);
void PostCompilation(UStaticMesh* StaticMesh);
void OnPostReachabilityAnalysis();
FDelegateHandle PostReachabilityAnalysisHandle;
};
#endif // #if WITH_EDITOR
#Loc: <Workspace>/Engine/Source/Runtime/Experimental/Chaos/Private/Chaos/Framework/Parallel.cpp:73
Scope (from outer to inner):
file
function void Chaos::PhysicsParallelFor
Source code excerpt:
InCallable(Idx);
};
const bool bSingleThreaded = !!GSingleThreadedPhysics || bDisablePhysicsParallelFor || bForceSingleThreaded;
const EParallelForFlags Flags = (bSingleThreaded ? EParallelForFlags::ForceSingleThread : EParallelForFlags::None);
const int32 MinBatchSize = ((MaxNumWorkers > 0) && (InNum > MaxNumWorkers)) ? FMath::DivideAndRoundUp(InNum, MaxNumWorkers) : 1;
ParallelFor(TEXT("PhysicsParallelFor"), InNum, MinBatchSize, PassThrough, Flags);
//::ParallelFor(InNum, PassThrough, !!GSingleThreadedPhysics || bDisablePhysicsParallelFor || bForceSingleThreaded);
}
void Chaos::PhysicsParallelForRange(int32 InNum, TFunctionRef<void(int32, int32)> InCallable, const int32 InMinBatchSize, bool bForceSingleThreaded)
{
TRACE_CPUPROFILER_EVENT_SCOPE(Chaos_PhysicsParallelFor);
using namespace Chaos;
// Passthrough for now, except with global flag to disable parallel
#if PHYSICS_THREAD_CONTEXT
const bool bIsInPhysicsSimContext = IsInPhysicsThreadContext();
#Loc: <Workspace>/Engine/Source/Runtime/Experimental/Chaos/Private/Chaos/Framework/Parallel.cpp:103
Scope (from outer to inner):
file
function void Chaos::PhysicsParallelForRange
Source code excerpt:
}
NumWorkers = FMath::Min(NumWorkers, InNum);
NumWorkers = FMath::Min(NumWorkers, MaxNumWorkers);
check(NumWorkers > 0);
int32 BatchSize = FMath::DivideAndRoundUp<int32>(InNum, NumWorkers);
int32 MinBatchSize = FMath::Max(InMinBatchSize, MinRangeBatchSize);
// @todo(mlentine): Find a better batch size in this case
if (InNum < MinBatchSize)
{
NumWorkers = 1;
BatchSize = InNum;
}
else
{
while (BatchSize < MinBatchSize && NumWorkers > 1)
{
NumWorkers /= 2;
BatchSize = FMath::DivideAndRoundUp<int32>(InNum, NumWorkers);
}
}
TArray<int32> RangeIndex;
RangeIndex.Add(0);
for (int32 i = 1; i <= NumWorkers; i++)
{
int32 PrevEnd = RangeIndex[i - 1];
int32 NextEnd = FMath::Min(BatchSize + RangeIndex[i - 1], InNum);
#Loc: <Workspace>/Engine/Source/Runtime/Experimental/Chaos/Private/Chaos/Framework/Parallel.cpp:170
Scope (from outer to inner):
file
function void Chaos::PhysicsParallelForWithContext
Source code excerpt:
InCallable(Idx, ContextIndex);
};
const bool bSingleThreaded = !!GSingleThreadedPhysics || bDisablePhysicsParallelFor || bForceSingleThreaded;
const EParallelForFlags Flags = bSingleThreaded ? (EParallelForFlags::ForceSingleThread) : (EParallelForFlags::None);
const int32 MinBatchSize = ((MaxNumWorkers > 0) && (InNum > MaxNumWorkers)) ? FMath::DivideAndRoundUp(InNum, MaxNumWorkers) : 1;
// Unfortunately ParallelForWithTaskContext takes an array of context objects - we don't use it and in our case
// it ends up being an array where array[index] = index.
// The reason we don't need it is that our ContextCreator returns the context index we want to use on a given
// worker thread, and this is passed to the user function. The user function can just captures its array of
// contexts and use the context indeex to get its context from it.
TArray<int32, TInlineAllocator<16>> Contexts;
::ParallelForWithTaskContext(TEXT("PhysicsParallelForWithContext"), Contexts, InNum, MinBatchSize, InContextCreator, PassThrough, Flags);
}
//class FRecursiveDivideTask
//{
// TFuture<void> ThisFuture;
// TFunctionRef<void(int32)> Callable;
//
// int32 Begin;
// int32 End;
//
#Loc: <Workspace>/Engine/Source/Runtime/Experimental/Chaos/Public/Chaos/Framework/Parallel.h:3
Scope (from outer to inner):
file
namespace Chaos
Source code excerpt:
#include "Templates/Function.h"
namespace Chaos
{
void CHAOS_API PhysicsParallelForRange(int32 InNum, TFunctionRef<void(int32, int32)> InCallable, const int32 MinBatchSize, bool bForceSingleThreaded = false);
void CHAOS_API PhysicsParallelFor(int32 InNum, TFunctionRef<void(int32)> InCallable, bool bForceSingleThreaded = false);
void CHAOS_API InnerPhysicsParallelForRange(int32 InNum, TFunctionRef<void(int32, int32)> InCallable, const int32 MinBatchSize, bool bForceSingleThreaded = false);
void CHAOS_API InnerPhysicsParallelFor(int32 InNum, TFunctionRef<void(int32)> InCallable, bool bForceSingleThreaded = false);
void CHAOS_API PhysicsParallelForWithContext(int32 InNum, TFunctionRef<int32 (int32, int32)> InContextCreator, TFunctionRef<void(int32, int32)> InCallable, bool bForceSingleThreaded = false);
//void CHAOS_API PhysicsParallelFor_RecursiveDivide(int32 InNum, TFunctionRef<void(int32)> InCallable, bool bForceSingleThreaded = false);
CHAOS_API extern int32 MaxNumWorkers;
CHAOS_API extern int32 SmallBatchSize;
CHAOS_API extern int32 LargeBatchSize;
#if UE_BUILD_SHIPPING
const bool bDisablePhysicsParallelFor = false;
const bool bDisableParticleParallelFor = false;
#Loc: <Workspace>/Engine/Source/Runtime/Projects/Private/PluginManager.cpp:1131
Scope (from outer to inner):
file
function void FPluginManager::FindPluginsInDirectory
Source code excerpt:
TArray<FString> DirectoriesToVisit;
DirectoriesToVisit.Add(PluginsDirectory);
FScopedSlowTask SlowTask(1000.f); // Pick an arbitrary amount of work that is resiliant to some floating point multiplication & division
constexpr int32 MinBatchSize = 1;
TArray<TArray<FString>> DirectoriesToVisitNext;
FRWLock FoundFilesLock;
while (DirectoriesToVisit.Num() > 0)
{
const float TotalWorkRemaining = SlowTask.TotalAmountOfWork - SlowTask.CompletedWork - SlowTask.CurrentFrameScope;
SlowTask.EnterProgressFrame(TotalWorkRemaining);
const int32 UnitsOfWorkTodoThisLoop = DirectoriesToVisit.Num();
ParallelForWithTaskContext(TEXT("FindPluginsInDirectory.PF"),
DirectoriesToVisitNext,
DirectoriesToVisit.Num(),
MinBatchSize,
[&FoundFilesLock, &FileNames, &DirectoriesToVisit, &PlatformFile](TArray<FString>& OutDirectoriesToVisitNext, int32 Index)
{
// Track where we start pushing sub-directories to because we might want to discard them (if we end up finding a .uplugin).
// Because of how `ParallelForWithTaskContext()` works, this array may already be populated from another execution,
// so we have to be targeted about what we clear from the array.
const int32 StartingDirIndex = OutDirectoriesToVisitNext.Num();
FFindPluginsInDirectory_Visitor Visitor(OutDirectoriesToVisitNext); // This visitor writes directly to `OutDirectoriesToVisitNext`, which is why we have to manage its contents
PlatformFile.IterateDirectory(*DirectoriesToVisit[Index], Visitor);
if (!Visitor.FoundPluginFile.IsEmpty())
{
#Loc: <Workspace>/Engine/Source/Runtime/Projects/Private/PluginManager.cpp:1143
Scope (from outer to inner):
file
function void FPluginManager::FindPluginsInDirectory
Source code excerpt:
DirectoriesToVisitNext,
DirectoriesToVisit.Num(),
MinBatchSize,
[&FoundFilesLock, &FileNames, &DirectoriesToVisit, &PlatformFile](TArray<FString>& OutDirectoriesToVisitNext, int32 Index)
{
// Track where we start pushing sub-directories to because we might want to discard them (if we end up finding a .uplugin).
// Because of how `ParallelForWithTaskContext()` works, this array may already be populated from another execution,
// so we have to be targeted about what we clear from the array.
const int32 StartingDirIndex = OutDirectoriesToVisitNext.Num();
#Loc: <Workspace>/Engine/Source/Runtime/Renderer/Private/DecalRenderingShared.cpp:356
Scope (from outer to inner):
file
function FTransientDecalRenderDataList BuildVisibleDecalList
Source code excerpt:
{
TChunkedArray<FTransientDecalRenderData> VisibleDecals;
};
TArray<FVisibleDecalListContext> Contexts;
const int32 MinBatchSize = 128;
ParallelForWithTaskContext(
TEXT("BuildVisibleDecalList_Parallel"),
Contexts,
Decals.Num(),
MinBatchSize,
[Decals, &View, ShaderPlatform, bIsPerspectiveProjection, FadeMultiplier](FVisibleDecalListContext& Context, int32 ItemIndex)
{
FTaskTagScope TaskTagScope(ETaskTag::EParallelRenderingThread);
const FDeferredDecalProxy* DecalProxy = Decals[ItemIndex];
FTransientDecalRenderData Data;
if (ProcessDecal(DecalProxy, View, FadeMultiplier, ShaderPlatform, bIsPerspectiveProjection, Data))
{
Context.VisibleDecals.AddElement(MoveTemp(Data));
#Loc: <Workspace>/Engine/Source/Runtime/Renderer/Private/RayTracing/RayTracing.cpp:293
Scope (from outer to inner):
file
namespace RayTracing
function void GatherRelevantPrimitives
Source code excerpt:
TChunkedArray<Nanite::CoarseMeshStreamingHandle> UsedCoarseMeshStreamingHandles;
TChunkedArray<FPrimitiveSceneInfo*> DirtyCachedRayTracingPrimitives;
};
TArray<FGatherRelevantPrimitivesContext> Contexts;
const int32 MinBatchSize = 128;
ParallelForWithTaskContext(
TEXT("GatherRayTracingRelevantPrimitives_Parallel"),
Contexts,
Scene.PrimitiveSceneProxies.Num(),
MinBatchSize,
[&Scene, &View, bGameView](FGatherRelevantPrimitivesContext& Context, int32 PrimitiveIndex)
{
// Get primitive visibility state from culling
if (!View.PrimitiveRayTracingVisibilityMap[PrimitiveIndex])
{
return;
}
const ERayTracingPrimitiveFlags Flags = Scene.PrimitiveRayTracingFlags[PrimitiveIndex];
check(!EnumHasAnyFlags(Flags, ERayTracingPrimitiveFlags::Exclude));
#Loc: <Workspace>/Engine/Source/Runtime/Renderer/Private/RayTracing/RayTracing.cpp:1302
Scope (from outer to inner):
file
namespace RayTracing
function bool GatherWorldInstancesForView
function void DoTask
Source code excerpt:
const uint32 BaseCachedStaticInstanceIndex = RayTracingScene.AddInstancesUninitialized(NumCachedStaticSceneInstances);
const uint32 BaseCachedVisibleMeshCommandsIndex = VisibleRayTracingMeshCommands.AddUninitialized(NumCachedStaticVisibleMeshCommands);
TArrayView<FVisibleRayTracingMeshCommand> CachedStaticVisibleRayTracingMeshCommands = TArrayView<FVisibleRayTracingMeshCommand>(VisibleRayTracingMeshCommands.GetData() + BaseCachedVisibleMeshCommandsIndex, NumCachedStaticVisibleMeshCommands);
const int32 MinBatchSize = 128;
ParallelFor(
TEXT("RayTracingScene_AddCachedStaticInstances_ParallelFor"),
RelevantCachedStaticPrimitives.Num(),
MinBatchSize,
[this, BaseCachedStaticInstanceIndex, CachedStaticVisibleRayTracingMeshCommands](int32 Index)
{
const FRelevantPrimitive& RelevantPrimitive = RelevantCachedStaticPrimitives[Index];
const int32 PrimitiveIndex = RelevantPrimitive.PrimitiveIndex;
FPrimitiveSceneInfo* SceneInfo = Scene.Primitives[PrimitiveIndex];
FPrimitiveSceneProxy* SceneProxy = Scene.PrimitiveSceneProxies[PrimitiveIndex];
ERayTracingPrimitiveFlags Flags = Scene.PrimitiveRayTracingFlags[PrimitiveIndex];
const FPersistentPrimitiveIndex PersistentPrimitiveIndex = RelevantPrimitive.PersistentPrimitiveIndex;
check(EnumHasAnyFlags(Flags, ERayTracingPrimitiveFlags::CacheInstances));