Performance Tuning
From Xugglewiki
This page is meant for tips and tricks on squeezing every last inch of performance out of Xuggler.
Contents |
What Formats and Codecs Does Xuggler Support
In Xuggler 3.0 and later, to get the full list of configurable options do:
Mac & Linux:
java -cp $XUGGLE_HOME/share/java/jars/xuggle-xuggler.jar com.xuggle.xuggler.Configuration
Windows:
java -cp %XUGGLE_HOME%\share\java\jars\xuggle-xuggler.jar com.xuggle.xuggler.Configuration
This will print all container formats, codecs, and options to the screen.
How Can I Get What Options I Can Set On A StreamCoder
In Xuggler 2.1 and later, to get the full list of configurable options do:
Mac & Linux:
java -cp $XUGGLE_HOME/share/java/jars/xuggle-xuggler.jar com.xuggle.xuggler.Configuration
Windows:
java -cp %XUGGLE_HOME%\share\java\jars\xuggle-xuggler.jar com.xuggle.xuggler.Configuration
This will print all container formats, codecs, and options to the screen.
How Can I Effect BitRate and Other Options
IStreamCoder has various settings it allows you to set, but whether each settings is honored depends entirely on the codec. The effect of setting bitrates (IStreamCoder.setBitRate(int)) depends on the codec, and most codecs use it as guidance. However sometimes it may seem that your setting of bitrate is being ignored. If so, check the following.
That said, here are some tips that might help:
- When using MediaTool objects (e.g. IMediaWriter)
- Add a listener to your IMediaWriter and override the onAddStream event
- In that event, find the IStreamCoder we've set up by calling:
event.getSource().getContainer().getStream(event.getStreamIndex()).getStreamCoder() - Follow the advice below
- For audio:
- Make sure that the
IStreamCoder.Flags.FLAG_QSCALEflag is set to false. If set to true, then Xuggler tries to honor theIStreamCoder.setGlobalQuality(...)setting at the expense of *everything* else including bitrate:coder.setFlag(IStreamCoder.Flags.FLAG_QSCALE, false); - Try the IStreamCoder.setBitRate and .setBitRateTolerance methods for your codec.
- Downsample the audio using the IAudioResampler object to a lower sample rate
- Make sure that the
- For video
- Make sure that the
IStreamCoder.Flags.FLAG_QSCALEflag is set to false. If set to true, then Xuggler tries to honor theIStreamCoder.setGlobalQuality(...)setting at the expense of *everything* else including bitrate:coder.setFlag(IStreamCoder.Flags.FLAG_QSCALE, false); - Try the IStreamCoder.setBitRate and .setBitRateTolerance methods for your coder.
- Adjust the setNumPicturesInGroup setting. This sets the ratio of key frames to inter frames.
- Try the IStreamCoder.setProperty methods to set various internal FFMPEG properties. See the ffmpeg documentation for what those are (things like macroblock size, etc).
- Make sure that the
Other Tips
As for pointers to things that might help you... we really hand off to FFMPEG to do all encoding/decoding, and so their settings determine all this information. We expose pretty much all the encoder/decoder settings through the setProperty interface, cludgy as it is. The problem is that documentation of all of those options is very very poor. But to get you started (you asked for pointers):
Explains x264 to ffmpeg mappings, but provides some documentation of what the ffmpeg options to (example "b" sets bitrate). See in particular the rate-control options: http://sites.google.com/site/linuxencoding/x264-ffmpeg-mapping
FFMPEG Documentation: This explains all the ffmpeg command line options http://ffmpeg.org/ffmpeg-doc.html
Finding all configurable options
In Xuggler 3.0 and later, to get the full list of configurable options do:
Mac & Linux:
java -cp $XUGGLE_HOME/share/java/jars/xuggle-xuggler.jar com.xuggle.xuggler.Configuration
Windows:
java -cp %XUGGLE_HOME%\share\java\jars\xuggle-xuggler.jar com.xuggle.xuggler.Configuration
This will print all container formats, codecs, and options to the screen.
What can I do to speed up my program?
The first thing you can do to speed up your program is *slow down*. If you haven't already, start measuring your performance and see where your problems are. I often use the JETM library to help with that.
If you have measured performance and now want to optimize here are some general ideas:
- Decrease your IStreamCoder.SetNumGroupInPictures (this spends less CPU, but trades off with lower compression) or play with other encoder settings on encoding. This tells the encoder how often a key frame should be inserted; or
- Experiment with different settings for IStreamCoder.setBitRate. Lower values require more CPU time to try to hit. Higher values require less CPU time, but take up more on-disk/in-network memory.
- Experiment with the IStreamCoder.setProperty() values. See the com.xuggle.xuggler.Configuration class to print out available options; or
- If you have to convert colorspace formats (e.g. YUV420P to BGR24), make sure you're using PixelFormat types that we have assembly optimizations for (I.E. Use IPixelFormat.Type.YUV420P, or if you need RGB, use IPixelFormat.Type.BGR24 for your IVideoPictures, and use TYPE_3BYTE_BGR for Java BufferedImages); or
- Decode fewer frames, perhaps just the key frames; or
- Try a faster CPU or more memory
- Turn off logging
How do I turn up or down logging?
Xuggler uses the Simplified Logging Framework ([SLF4J]) in our code, and then bind it to the [logback] system at runtime. In theory you can replace [logback] with any other logging framework [SLF4J] supports, although we haven't really tested that.
To configure logging with logback see the [logback configuration] guide. The basic trick is to have a file named logback.xml in your classpath (usually the root). Here's an example logback that disables all [FFmpeg] output, and only prints Xuggler messages of type ERROR or higher, sends all messages to a file named "test.log", and also echos them on the console:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<appender name="CONSOLE"
class="ch.qos.logback.core.ConsoleAppender">
<layout class="ch.qos.logback.classic.PatternLayout">
<Pattern>%d{ISO8601} [%thread] %-5level %logger{35} - %msg%n</Pattern>
</layout>
</appender>
<appender name="FILE" class="ch.qos.logback.core.FileAppender">
<File>test.log</File>
<Append>false</Append>
<Encoding>UTF-8</Encoding>
<BufferedIO>false</BufferedIO>
<ImmediateFlush>true</ImmediateFlush>
<layout class="ch.qos.logback.classic.PatternLayout">
<Pattern>%d{ISO8601} [%thread] %-5level %logger{35} - %msg%n</Pattern>
</layout>
</appender>
<root>
<level value="ERROR" />
<appender-ref ref="CONSOLE" />
<appender-ref ref="FILE" />
</root>
<logger name="com.xuggle">
<level value="ERROR"/>
</logger>
<logger name="org.ffmpeg">
<level value="OFF"/>
</logger>
</configuration>
Should I attempt to pool Xuggler objects?
It depends.
This is a very complicated topic and there is no real solution that works well for all cases. The general rule of thumb I can give you is:
- spend a little time on up front design, but don't worry about the Garbage Collector (GC) or MemoryModels while developing. I know as a C++ developer this strikes you as strange, but resist the urge to optimize.
- re-use media objects (IPacket, IVideoPicture, IAudioSamples) instead of reallocating them if it's easy. allocation of media objects can cause more GCs to occur than necessary. don't worry about the other Xuggler objects (e.g. IStreamCoder) as you'll tend to infrequently allocate and deallocate those.
- when you're at the stage of testing & analysing performance, identify your problem areas with metrics. they'll likely not be where you think. I like JETM or JAMon as APIs for capturing application-specific performance metrics.
- if you think GC is your issue, run some tests with different garbage collectors. Some garbage collectors do better with multiple processors, others have shorter pause times for collection. Experiment with the java option "-verbose:gc" to see what collections are occurring.
- then, and only then, decide whether or not it makes sense to pool objects.
- and if that doesn't work, only then should you investigate the com.xuggle.ferry.JNIMemoryManager's MemoryModel technology.
The Java GC is very very fast at destroying dead objects. Really fast - in fact, it's on the order of a few byte-code instructions as all it has to do is increment two long pointers. But it turns out that GC of LIVE large objects is not free. If a large object stays alive across a collection, then the entire object will be copied by the GC on every collection it's alive for.
Now Sun's java implementation uses a generational-garbage-collector, which means objects start in a Young generation, and if they are copied more than X times, will then get moved to a "Tenured" generation. The GC attempts to collect only in the Young generation first (since 85% to 98% of all objects die young, a tragedy unnoticed by the world's media I'll point out). If that recovers enough memory it never collects the Tenured generation. But if it can't free enough space in the Young generation it (generally) stops the world and collects the Tenured generation*. This means that sometimes your "reused" object will be copied 30 or more times uselessly when in reality had you allocated it new each time it would have been cheaply destroyed on each small collection.
So if you're re-using large objects because you think that's going to improve performance, you should measure performance first and if you're still thinking this is the way to go, your goal should be to get your large objects into the tenured generation as quickly as possible, and set your young generation to be large enough so that a full (tenured) collection only occurs rarely. You can experiment with that through tuning parameters you pass to Java and by passing flags to Java to see how it's collecting.
That's all I have to say on this topic. As this stage I strongly suggest you stop reading, and go back to not worrying about things. Really I mean that. Honest.
Seriously.
Stop reading.
I'm not kidding.
But if you're a real glutton for pain, there's one more interesting thing going on that you may want to be aware of. I'm documenting this here so folks who care can see what's going on, but I really do recommend that most people ignore this.
To make things more complicated, data storage for large media objects in Xuggler (basically objects that implement IMediaData) has to be allocated off of the native Heap. By default Xuggler has Java do this (by creating a backing Java heap byte[] array) for us but you can disable that feature if you want (see the JNIMemoryManager's memory model documentation) to force us to use our own variant of malloc()/free() instead (our variant ensures 128-bit alignment and a small header for our own nefarious purposes). We need to allocate native memory so FFmpeg can access the memory, but we default to having Java do it with a backing byte[] structure for us so that the Java GC 'accounts' for the large objects and hence uses their size to calculate when a collection is necessary.
In general, this is what 99.999% of all applications want to have happen. But due to limitations in Sun's JVM (I'm not going to go into why right now, but it has to do with multi-threaded collections) the result of this is that each object will take up more memory than strictly necessary (how much depends on the JVM, but it can be as much as 2x per video frame). And because it's native memory it means that you MAY eventually run into fragmentation issues even though the Java GC is compacting (actually unlikely if you're not constantly changing video size). If you think that's your issue, you can try telling Xuggler to not use Java for large-object native memory allocation, but then you'll want to make sure you set a very small java heap size -- because Xuggler can't actually release the data held on to by a java program until Java collects the objects that are referencing the Xuggler objects, and because Java will (if we haven't used Java to allocate the objects) think objects are smaller than they actually are, it may not collect frequently enough.
To illustrate the problem, imagine you have a 480x360 BGR IVideoPicture. That takes 500k of memory to hold all the pixel data. If we use Java to allocate the native memory, then the Java garbage collector will assume that the IVideoPicture object is referencing about 500k of Java memory (which may be backed by another 500K of native memory in some JVM situations). If we don't use Java to allocate the native memory, then the Java garbage colector will assume the IVideoPicture references about 20 bytes and will be completely unaware about the underlying 500k of native memory, and so won't collect as often. And suddenly you may find yourself getting strange OutOfMemoryErrors (often saying "you don't have enough swap space"). Because of this we default to having Java allocate the memory -- only turn this off as a last result. Turning it off can improve performance when you're running inside a JVM with very low memory constraints, but then you have to either take a lot more responsibility for managing your own memory or trick the Java garbage collector into being more agressive and collecting those 20-byte objects.
Last thing on this rant if you want, you can totally take control of when Xuggler frees native memory yourself. See:
com.xuggle.ferry.JNIMemoryManager.setMemoryModel(MemoryModel model);
for details.
When you're done with an object, call "obj.delete()". This will release any underlying native references immediately but also means that you must ensure obj is no longer used. We try to throw a nice java error if you use an object after you call delete(), but we cannot guarantee safety if you're having multiple threads access that object. See the com.xuggle.ferry documentation for details.
(*) Different garbage collectors actually behave somewhat differently. For example the concurrent mark and sweep collector running with the parallel young generation collector will not actually compact memory and so memory fragmentation can become an issue. Read the docs on each for details, but in general follow the rules of MEASURE YOUR PERFORMANCE first, and then decide what to do.
