02: Findings Summary
Version 3.0, Updated April 2023 using Octane 2022.1 and C4D 2023.1.3
About This Guide
This is part of an ongoing series on the exciting world of resource management!
The goal of this series is to explore what C4D and Octane are doing under the hood, and how to tune our system resources and habits to make our workflow as zen and frustration-free as possible. Part 01 is an overview of how the whole system works. Part 02 (this one) is a rollup of all the findings of Part 03 and beyond, which will each be a deep-dive into a particular area of interest.
Guides in this series
- 01: Overview
- 02: Findings Summary <--This one
- 03: Polygons
- 04: Instances
- 05: Textures
- 06: Displacement (coming soon)
This guide is also available in PDF format here.
I. Intro
This guide will serve a running summary of the takeaways learned in the other parts of the series, and will be updated every time a new part comes out, or something new is discovered.
Testing system
We’ll be using a 2021 Razer Blade 15 for these guides. Specs: i9-11900H CPU/64GB RAM/RTX3080 Laptop GPU/16GB VRAM/4K 60Hz OLED. OS: Win 11 Home 22H2. Cinema 4D 2023.1/Octane Render 2022.1
Test scenes are 1280x1280, 30FPS. All unnecessary apps and processes have been shut down.
The Gold Standard
An Octane render is at its absolute best when all of the scene data (polygons, textures, instances, overhead, and everything else) fits completely in VRAM, and the system RAM does not fill up to 100% while processing.
C4D’s performance is at its absolute best when the frames per second reported in the viewport does not drop below the target FPS of the scene. This is covered in detail at the end of the Intro guide for this series.
Overhead
When we first launch C4D (assuming we remembered to shut everything else down), we have ~12.5 GB out of 16 GB of VRAM to use for Octane, ~57 GB out of 64 GB of system RAM to use for pre-processing, and are running at 300-350 FPS in the viewport. This leaves us a lot of room to add geometry, textures, and other stuff so our render isn’t just a white square.
II. Findings from the polygons guide
A more detailed breakdown of this information can be found in the Polygons Guide in this series.
Key Takeaways
- In C4D, one million quads takes about 70 MB (0.07GB) of RAM.
- Octane can only use triangles, not quads or n-gons.
- Two million triangles use 117 MB of RAM in C4D.
- Octane uses ~360 MB (0.36) of VRAM for the same two million tris.
- 10-16 million quads is where perfomance starts to take a noticeable hit - time to start looking into optimizing.
- ~32 million baked quads or ~40 million subdivided quads fills the VRAM of this test machine.
- Out-of-core memory is much slower, but allows us to render even more polygons if necessary.
Pre-processing
In C4D, polygons as either baked (editable) and stored in a C4D file, or generated on the fly by generators such as primitives (spheres, cubes, etc), NURBS objects (Sweeps, Lathes, etc), or other stuff like Subdivision Surface objects. All generated polygons are baked as part of the pre-processing phase of a render.
Important: C4D recognizes triangles, quads (4-sided) and n-gon (n-sided) polygons, where Octane only recognizes triangles, so all geometry is converted to triangles prior to being sent to Octane to render.
Subdivision strategies
C4D’s Subdivision Surface object creates the newly subdivided polygons and stores them in RAM, which allows us to see the results of the subdivided mesh in the viewport, but takes up a little more RAM and VRAM when processing.
Octane’s Object Tag allows for subdividing at render time, which lets C4D take up less VRAM during the render, and keeps the viewport FPS up, but does not show what the mesh will really look like until the scene is rendered.
Polygons and Memory
Ctl-I gives a good idea of how much RAM the polygon objects in the scene will use. To see amount of VRAM used (after a render), we can go to Octane Settings>Settings Tab>Device Settings.
The same amount of polygons take up less RAM than VRAM, but we want to keep them in VRAM (not push them to Out-of-core RAM) during the render to make sure the render is as fast as possible.
.37 GB per 1,000,000 quads (2M tris) is a good guideline for estimating VRAM usage.
Getting a Polygon Count
Lots of polygons means lower performance, so it’s a good idea to keep an eye on how many are in the scene. Both Octane and C4D have methods for seeing how many polygons are in a scene.
In Octane, The Live Viewer has an overlay that displays the number of triangles the most recent render used. There is a more accurate count (to the triangle) in the GPU Information popup in the Help menu. The Octane Log will also display this value if the scene was rendered to the Picture Viewer.
In C4D, Ctl-I (Cmd-I on the Mac) pops open an information window that shows how many polygons are in the scene. This isn’t super accurate for Octane because C4D treats a triangle, a quad, and an n-gon all as one polygon, so after splitting these into triangles, the number could be vastly different if it’s an all-quad or all-n-gon scene.
a Total Polycount can be turned on in the HUD (Shift-V > HUD tab). This relies on being in Polygons mode and selecting some geometry. This only works for baked polygons.
The Structure Manager will show a breakdown into Tris/Quads/N-gons for selected geometry. This also only works for baked polygons. After being in Polygons mode and selecting the baked geo, the Structure Manager can be found by going to the Attributes Manager, choosing the Mode menu, and Project Info, then clicking the Structure tab.
Tips for Using Out-of-core (OOC) memory
Avoid using the Live Viewer when OOC is needed. Instead, use a lower polygon proxy object, fewer instances, and/or lower resolution textures when doing lighting and setup, and then only turn on the final high poly geometry or high res textures when rendering to the Picture Viewer.
Before the final render using OOC memory, restart the computer, and then pause ALL extra processes (cloud storage apps, anti-malware, etc.). This frees up the maximum amount of resources to limit the amount of swapping and pre-processing time.
RTX is not enabled while using OOC memory, so this may further slow the render down.
Polygon milestones of pain
- 1 million quads (2M tris) or less: Almost no impact on the system.
- 4 million quads (8M tris): Render pre-processing time is 5-6 seconds, Everything else is still fine.
- 16 million quads (32M tris): Render pre-processing time is 20-25 seconds. RAM and VRAM don’t fill up, and the viewport and playback FPS is still fine. It’s just a matter of how long we want to wait to see something happen if the geo changes.
- Somewhere between 16-32 million quads: RAM maxes out, so things slow way down. Viewport performance drops below 60, but it’s still above 30, so it’s real-time.
- 32 million quads (64M tris): This is about the max for baked polygons. RAM maxes out, pre-processing time is high.
- 40 million quads (80M tris): This is about the max for subdivided polygons. FPS in the viewport is in danger of dropping below 30, pre-processing time is really high.
- 64 million quads (~128M tris) will actually render if we change the settings to give Octane 20GB of Out-of-core memory. This takes 13 minutes of pre-processing time before it can even start to render. Viewport response is below 30FPS, and it’s laggy and stuttery. Good as a last resort, but not recommended. Definitely render to Picture Viewer with a polycount this high.
III. Findings from the instances guide
A more detailed breakdown of this information can be found in the Instances Guide in this series.
Instances are procedural copies of objects. These are usually found in Cloners and referred to as “clones”, but C4D has a standalone Instance Object which is essentially a single clone that’s not attached to an internal Matrix system.
Cinema 4D has three flavors of instances that can be chosen in a cloner or Instance object. Regular ol’ Instance, Render Instance, and Multi-instance. Octane doesn’t play well with Render Instance, so we’re going to ignore it here.
Regular Instances are similar to making copies by hand. Each one consumes the same amount of RAM and VRAM, and has the same impact on the viewport as the original source object. This means we’re limited in the number we can have due to how C4D handles individual objects before our system grinds to a halt. Most of what we can do to a regular geometry object, we can do to an instance (deformation, animation timing, texturing, etc.)
Multi-instances exist as a system which is treated as a single object. This gets around C4D’s limitation of the number of objects in a scene, but at the cost of versatility when it comes to individual animation, deformation, and texturing. The source object is only loaded once into RAM and VRAM, making it efficient with lots of high poly objects. Very large systems (millions of instances) will start to take a while to build and pre-process before a render, and have a negative impact on the viewport performance.
Octane Scatter is Octane’s native instance system. It works mostly like Multi-instance, but there are some key differences that make it better when visualizing huge systems (millions of instances +). The Display Mode actually makes a difference in whether it acts like regular Instances or Multi-Instance, and how fluid the viewport experience is. Since it doesn’t have a built-in grid system, it relies on the Matrix Object if a regular 3D grid is needed, and the Matrix itself can start slowing things down with lots of instances.
When to use what
We can get away with regular Instances when we just have a few hundred low-to-medium poly instances that don’t really impact our performance much. We need to use this mode if we want to deform the instances individually (say, set up a twist deformer that only affects certain clones as they get closer to it), or if our source geometry has keyframed animation in it that we want to offset per clone. There are also cases that we’ll run into where texturing doesn’t work as expected in other modes, so we may to revert back to regular Instances, and optimize our setup more if that happens.
We’ll want to switch to Multi-instances if the performance starts to lag due to having a ton of clones in the scene (thousands +). We may have to make some concessions about animation offsets or deformation (or find workarounds and hacks), but our working environment will be a lot smoother and we’ll still be able to reliably time animations or view the actual object in the viewport up to tens of thousands of clones.
We’ll want to investigate Scatter if we have a system that has tens of thousands of instances and we want better viewport performance, or if we reach a point where timing an animation is crucial and Multi-instance keeps stuttering at the loop point (frame 0) when we’re playing back in the viewport. There will also be times where Shader Effectors/Fields don’t work so well in Multi-instance, but we have too many objects to use a regular Instance - it may work ok in Scatter.
Individual Object/Instance Mode Limit
Regardless of which systems are used, C4D can only handle about 20,000 individual objects of any kind in the viewport before the FPS drops below 30 on this computer. If any of those objects are made of polygons, the number drops significantly. A Cloner or Instance object in Instance mode treats each clone as an individual object, and is therefore subject to this limitation.
The following applies to both individual objects or clones in Instance mode.
3,000 visible polygon objects (up to ~3k polys apiece) drops below 30 FPS in the viewport.
10,000 visible polygon objects is about where things get too sluggish to work with (~10 FPS). It also adds 8 seconds to the pre-processing time prior to a render for a one polygon source object.
At this point, we need to start seriously considering Multi-instance or Scatter if at all possible.
20,000 visible non-polygon objects (nulls, etc) is about the limit for working in real time. These can be used as placeholders that show position and swapped for real geometry prior to a render.
30,000 hidden objects (polygon or non-polygon) should still get us about real time performance. Pre-processing time gets up to 40 seconds with a 1 polygon source object, and RAM and VRAM can quickly max out if the polycount isn’t kept low.
100,000 polygon objects puts us into the single digits for FPS, even when they’re hidden from the viewport. The ridiculous pre-processing time (several minutes) means we should avoid getting our count up this high.
Multi-instances
C4D treats a Multi-instance system as a single object, which vastly increases the viewport and render performance. It comes at a cost - individual multi-instances can’t be individually deformed, have their properties modified or animation offset, and a lot of versatility in texturing is lost.
A Cloner also has to rebuild itself on frame 0 of the animation, so if this takes any significant amount of time, it becomes difficult to time the first few frames of animation while C4D is rebuilding. On this computer, it’s usually around the 30,000 clone mark when the Viewport Mode is set to Object and the source is fairly low poly, higher in the other modes.
Multi-instance has different Viewport Modes which are good for increasing viewport performance without affecting the render.
<30,000 multi-instances is usually a pretty fluid experience unless the source object is particularly heavy.
30,000 multi-instances is where the rebuild time will start to be noticeable and affect the FPS of the first few frames of the animation. If this timing is essential, we should consider moving to the Octane Scatter where this isn’t an issue.
80,000 multi- instances is about the limit of real-time navigation (and playback after the first few frames) in the viewport if the Viewport Mode is set to Object.
---Consider setting Viewport Mode to Points---
170,000 multi-instances with the Viewport Mode set to Points Mode is where we can no longer reliably time the first few frames of an animation due to the rebuild time taking some time at frame 0.
400,000 multi-instances drops us below real time navigation if the Viewport Mode is set to Points.
---Consider setting Viewport Mode to None---
2,000,000-5,000,000 multi-instances is where viewport lag is bad enough that we’re going to want to switch the Viewport Mode to None. Navigation FPS in this mode is never an issue since it’s always just one hidden object. Animation FPS will continue to suffer more and more at the loop point as we add more clones from here.
10,000,000 multi instances is where we start seeing significant pre-processing time.
15,000,000 multi-instances takes 40 seconds to redraw, 30 seconds of pre-processing, and gets close to maxing out the RAM during a render.
25,000,000 multi-instances is probably our upper working limit. It takes a minute to redraw, and 1:10 of pre-processing time. VRAM is at nearly 9GB at this point without any extra geometry, textures, etc.
25,000,000-30,000,000 multi-instances is where Octane stops being able to render the scene.
40,000,000 multi-instances is where C4D runs out of memory.
Octane Scatter
Scatter is Octane’s native instancing system. Most of the time it’s used in a similar fashion to the Cloner in Multi-instance mode, but it can be set up to work more like Instance mode as well.
Things to consider with Scatter
Scatter does not have a built-in grid system like the Cloner. It’s most efficient in the Scatter Distribution Type set to Surface, and using a low-poly piece of geometry to spread instances across. If a grid pattern is desired, it works with a C4D Matrix object using the Vertex Distribution type. With huge systems, the Matrix can consume resources and slow everything down.
Scatter’s Display Mode drastically impacts performance. Line mode (default) is the most efficient and can display the most instances in the viewport before slowing down below real time (~200k on this test machine). Box, Circle, and Sphere modes can display progressively fewer instances, but show more information about the source object. in Object mode, Scatter acts like an Instance Cloner. Only a 3,000 or so can be shown, but they can be individually deformed.
Display Rate is one of Octane’s super powers. The Cloner in Multi-instance mode can only display 100,000 points or so before we need to turn off viewport feedback altogether. Scatter can display about 200,000 lines, but if we have a system with say, two million instances, we can set the display rate to 10%, meaning only one in ten instances are shown for a total of 200k lines again. Two million instances will still render, but now we can at least see some sort of representation of where the instances are.
Build time with Scatter can take longer initially, but it doesn’t have to redraw when the animation loops around to frame 0 like Multi-instance. That combined with the Display Rate makes it a much better candidate for visualizing and timing the animation of very large systems.
Scatter doesn’t hide the source geometry (the thing being cloned) or the Surface (the thing it’s putting the clones on) from the viewport. They can be hidden via the traffic lights which works fine in most Display Modes, but it ends up hiding all the clones in Object mode. When Object mode is being used, just make the Y or Z coordinate of the source object some crazy high number to push it off the viewport.
Crash Resistance
When using a Matrix object as our Surface, things can get unstable over 5 million instances or so. To mitigate crashes, this is the best order to do things when we need to change the number of instances or the size, or something else related to the Matrix.
- Stop the Octane Live Viewer
- Hit the green check next to the Scatter to disable it.
- Hit the green check next to the Matrix to disable it.
- Adjust the Matrix count or size, or whatever.
- If any significant number of matrices were added, consider lowering the Display Rate in the Scatter
- Re-enable the Matrix.
- Re-enable the Scatter.
Stress Test
3,000 Scatter Instances is about the max for real time performance when the Display Mode is set to Object.
--- Adjust the Display Rate to get the number of visibile instances close to these limits ---
20,000 Scatter Instances is about the max for 30 FPS using Display Mode: Sphere
25,000 is the max visible instances for 30 FPS using Display Mode: Circle
100,000 is the max visible instances for 30 FPS using Display Mode: Box
200,000 is the max visible instances for 30 FPS using Display Mode: Line
5,000,000 instances is where the Matrix starts to get unwieldy. Follow the steps above to avoid crashes. Build time for Scatter is about 17 seconds, 30 FPS is achievable with a Display Rate of 2.5%.
10,000,000 instances takes a full minute to build on this machine. 30 FPS at 0.7%. Pre-processing is at 40 seconds. VRAM is at 4.46 GB
15,000,000 instances takes 2:30 to build. Pre-processing is at 40 seconds, VRAM is at 5.6 GB
25,000,000 instances is about the upper limit. It takes 3:10 to build the Scatter (and more for the Matrix). Pre-processing is at 50 seconds, and VRAM is at 9.9 GB. RAM starts to max out as well. 30FPS can still be had at 0.1% Display Rate.
30,000,000 instances won’t render.
Resource Limit
At 25,000,000 multi-instances, the system still isn’t maxing out the VRAM since Octane only needs to load the source geometry into memory once. That means the source object can have a lot more polygons. The upper limit appears to be about 13.5 million triangles per instance for a grand total of 338 trillion polygons in the render.
IV. Findings from the Textures Guide
Overview
In a nutshell, Octane imports and decompresses an external image file, and stores it in RAM. This image is contained in a shader (ideally an Octane ImageTexture shader). The ImageTexture shader has options to modify the bit depth and number of channels used. It also gives us the option to compress and cache a smaller version of the image if desired. When a render is initiated, the texture data is sent to VRAM along with the geometry and other scene stuff to be used by the GPU to produce the final render.
Import/Decompression
Octane needs to decompress nearly every file format/compression combination (PNG, JPG, EXR, BMP, GIF, TIF, TGA, etc.) to be able to use it.
This means that regardless of the file’s size on disk, the amount of VRAM and RAM it uses is always determined by the resolution, number of channels, and bit depth. If an image is 8192x8192, 8-bit, and RGB, then regardless if it’s saved as a 2MB PNG, 1MB JPEG, 20MB EXR, or 50MB TIF, it will use 256 MB of VRAM in Octane.
Resolution is the pixels across (Width) times the pixels high (Height).
Channels in Octane when calculating VRAM usage means either four (RGBA) or one (Gray). A 3-channel RGB is always treated as a 4-channel RGBA in regards to VRAM.
Bit depth for image files is usually stated in bits per channel (BPC). This is typically 8, 16, or 32. Octane displays bits per pixel (BPP) instead, and for the correct bit depth of the file, so a 3-channel RGB file that’s 16 bits per channel will display as 48-bit (16BPC x 3 channels). BUT, It will still use VRAM as if it’s a 4-channel RGBA image.
VRAM Usage (uncompressed)
Octane’s VRAM usage for an uncompressed RGB or RGBA image can be calculated using this formula:
(Width * Height * 4 * bit depth) / 8,388,608 = megabytes in VRAM
The VRAM usage for an uncompressed grayscale image (with the proper ImageTexture settings) is:
(Width * Height * bit depth) / 8,388,608 = megabytes in VRAM
Because of this, it’s always best to find the smallest possible resolution and bit depth for each texture that will suit the needs of the scene.
ImageTexture Shader
The ImageTexture shader is where any imported bitmap image should live (as opposed to C4D’s native bitmap shader). It gives us a bit of control over the channels, bit depth, and compression.
Type Dropdown
Important: The default for this dropdown is Normal, which means 4-channel RGBA. That’s correct for any color image (RGB or RGBA), but it will convert a grayscale (1-channel) image to four channels, thereby quadrupling the VRAM used. If an image is natively single-channel grayscale (or you want to convert an RGB image to grayscale), this should be set to Float. This will reduce the channels to from four to one, which will quarter the VRAM used.
Compression Button
This is a button that says “Automatic” under the Type dropdown. There are options in here for BCn compression which we’ll cover in a minute, but for 16- and 32-bit images, there’s also an option to specify 32- or 16-bit. Specifying 32-bit is pretty useless since Automatic already assumes that, but if we have a 32-bit image that doesn’t need that much detail, we can try reducing it to 16-bit which will halve the VRAM usage for that texture, and often will not have a visible affect in the render (pay extra attention to the effects on displacement though).
Reporting Tools
The Texture Manager does a calculation for VRAM when the image loads in, and NEVER CHANGES IT AGAIN, regardless of how much fiddling we’re doing in the ImageTexture node. These calculations are only correct for 8- and 16-bit uncompressed RGB 1:1 images, 32-bit uncompressed Grayscale 1:1 images, and 2:1 uncompressed HDRI images (which it doesn’t show in the Texture Manager anyway when loaded into an environment). For all other resolutions and bit depths, it’s probably wrong and should be ignored.
The Device Settings Window (Octane Settings>Settings Tab>Device Settings button) shows an accurate readout of how much VRAM was used, but only after a render has finished.
Here’s a table of actual VRAM usage (in MB) for common uncompressed image sizes/bit depths:
Use |
Resolution |
RGB-8 |
RGB-16 |
RGB-32 |
Gray-8 (f) |
Gray-16 (f) |
Gray-32 (f) |
0.5K Texture |
512x512 |
1 |
2 |
4 |
0.25 |
0.5 |
1 |
1K Texture |
1024x1024 |
4 |
8 |
16 |
1 |
2 |
4 |
2K Texture |
2048x2048 |
16 |
32 |
64 |
4 |
8 |
16 |
4K Texture |
4096x4096 |
64 |
128 |
256 |
16 |
32 |
64 |
8K Texture |
8192x8192 |
256 |
512 |
1024 |
64 |
128 |
256 |
1K HDRI |
1024x512 |
1 |
4 |
8 |
0.5 |
1 |
2 |
2K HDRI |
2048x1024 |
8 |
16 |
32 |
2 |
4 |
8 |
4K HDRI |
4096x2048 |
32 |
64 |
128 |
8 |
16 |
32 |
8K HDRI |
8192x4096 |
128 |
256 |
512 |
32 |
64 |
128 |
16K HDRI |
16384x8192 |
512 |
1,024 |
2,048 |
128 |
256 |
512 |
24K HDRI |
24576x12288 |
1,152 |
2,304 |
4,608 |
288 |
576 |
1,152 |
BCn compression
This is worth reading the deep-dive guide for, but summed up very quickly:
Octane natively supports a type of lossy image compression called BCn that it does not need to decompress in order to use. BCn has either a 4:1 or 8:1 compression ratio, which can drastically cut down on the VRAM needed.
It can either temporarily build compressed versions of images using these algorithms and store them in a cache, or it can import and use DirectDraw Surface (.DDS) files which are already compressed with BCn algorithms.
If the caching method is used, An appropriate BCn algo is chosen in the ImageTexture shader’s compression button (it just says “Automatic” by default). This process can spike the CPU to 100% for up to several minutes for each image (depends on the resolution and bit depth) while it’s compressing, and trying to do several at a time can cause instability. Also if the cache is cleared or the file is needed to be used on a different machine, the images need to be re-cached, which can take a LOT of time if it’s trying to do it all at once, and there are a lot of images to compress/cache.
Cached files are stored here: C:\Users\<<yourusername>>\AppData\Local\OctaneRender\cache\
If .DDS files are used, an outside program like the one made by AMD, NVIDIA, or Intel needs to be used to convert the files. The proper BCn compression should be selected for each image, depending on the makeup and what it’s meant to be used for.
General Strategy
Don’t trust the file size of image textures - it gives a false sense of security :)
First see if we can get away with procedural textures instead of imported images. If not, seek out the lowest resolution/bit depth images we can use while not hurting the quality of the render. Always use the ImageTexture shader for bitmaps instead of the C4D native one. In the ImageTexture shader, set the type dropdown to Float for any grayscale image, and see about reducing 32-bit images to 16-bit using the compression button (just says “Automatic” by default) if it doesn’t hurt the image quality.
If the VRAM usage is still too high, we can look into BCn compression. For a few large images that need to be high res and/or 16-bit, we can try compressing/caching them to disk using BC6 or BC7. If we have a ton of images or need to use the file across multiple machines, we should consider looking into .DDS format and compressing the images accordingly.
Wrap Up
That’s it for right now, but this will get fleshed out more as more guides are written in this series, so stay tuned!
Author Notes
OG029 Resource Management: Findings, version 1.0, Last modified January 2023.
This guide originally appeared on https://be.net/scottbenson and https://help.otoy.com/hc/en-us/articles/212549326-OctaneRender-for-CINEMA-4D-Cheatsheet
All rights reserved.
The written guide may be distributed freely and can be used for personal or professional training, but not modified or sold. The assets distributed within this guide are either generated specifically for this guide and released as cc0, or sourced from cc0 sites, so they may be used for any reason, personal or commercial. The emoji font used here is Noto Color Emoji.