avreader¶
Contents
avreader.load¶
Load audio, video or both.
|
Return audiovisual data, frame rate and sample rate. |
|
Return data and the sample rate from an audio file. |
|
Return data and the frame rate from a video file. |
-
avreader.load(fpath, offset=0.0, duration=None, akwargs=None, vkwargs=None)[source]¶ Return audiovisual data, frame rate and sample rate.
- Parameters
fpath (str) – Path to the input file.
offset (Union[float, str], optional (default=0.0)) – Start reading after this time. Offset must be a time duration specification, see https://www.ffmpeg.org/ffmpeg-utils.html#time-duration-syntax.
duration (Union[float, str, None], optional (default=None)) – Only load up to this much audio. Duration must be a time duration specification, see https://www.ffmpeg.org/ffmpeg-utils.html#time-duration-syntax.
frame_rate (Optional[float], optional (default=None)) – [description]
frame_size (Optional[str], optional (default=None)) – [description]
grayscale (bool, optional (default=False)) – Converting video to grayscale.
sample_rate (Optional[float], optional (default=None)) – Target sampling rate. If None, sample_rate is the native sampling rate.
mono (bool, optional (default=True)) – Converting signal to mono.
data_format (str, optional (default="channels_first")) – The ordering of the dimensions in the outputs. If “channels_last”, data_format corresponds to inputs with shape (batch, steps, channels) while “channels_first” corresponds to inputs with shape (batch, channels, steps).
dtype (torch.dtype, optional (default=torch.float)) – Desired output data-type for the tensor, e.g, torch.int16.
- Returns
video (Tuple[torch.Tensor, int]) – [description]
audio (Tuple[torch.Tensor, int]) – [description]
-
avreader.load_audio(fpath, offset=0.0, duration=None, sample_rate=None, mono=True, filters=None, data_format='channels_first', dtype=torch.float32)[source]¶ Return data and the sample rate from an audio file.
- Parameters
fpath (str) – Path to the input file.
offset (Union[float, str], optional (default=0.0)) – Start reading after this time. Offset must be a time duration specification, see https://www.ffmpeg.org/ffmpeg-utils.html#time-duration-syntax.
duration (Union[float, str, None], optional (default=None)) – Only load up to this much audio. Duration must be a time duration specification, see https://www.ffmpeg.org/ffmpeg-utils.html#time-duration-syntax.
sample_rate (Optional[float], optional (default=None)) – Target sampling rate. If None, sample_rate is the native sampling rate.
mono (bool, optional (default=True)) – Converting signal to mono.
filters (Optional[str], optional (default=None)) – Add a FFmpeg filtergraph, see https://ffmpeg.org/ffmpeg-filters.html.
data_format (str, optional (default="channels_first")) – The ordering of the dimensions of the output audio. If “channels_last”, data_format corresponds to output tensor with shape (seq_len, channels) while “channels_first” corresponds to output tensor with shape (channels, seq_len).
dtype (torch.dtype, optional (default=torch.float)) – Desired output data-type for the tensor, e.g, torch.int16.
- Returns
audio (torch.Tensor) – Data read from audio file.
sample_rate (int) – Sample rate (in samples/sec) of audio file.
- Raises
ValueError – [description]
subprocess.CalledProcessError – [description]
-
avreader.load_video(fpath, offset=0.0, duration=None, frame_rate=None, frame_size=None, grayscale=False, filters=None, data_format='channels_first', dtype=torch.float32)[source]¶ Return data and the frame rate from a video file.
Return a torch.Tensor (C, H, W) in the range [0.0, 1.0] if the dtype is a floating point. In the other cases, tensors are returned without scaling.
- Parameters
fpath (str) – Path to the input file.
offset (Union[float, str], optional (default=0.0)) – Start reading after this tile. Offset must be a time duration specification, see https://www.ffmpeg.org/ffmpeg-utils.html#time-duration-syntax.
duration (Union[float, str, None], optional (default=None)) – Only load up to this much audio. Duration must be a time duration specification, see https://www.ffmpeg.org/ffmpeg-utils.html#time-duration-syntax.
frame_rate (Optional[float], optional (default=None)) – Target frame rate. If None, frame_rate is the native frame rate.
frame_size (Union[int, Tuple[int, int], None], optional (default=None)) – Target frame size (width, height). If None, frame_size is the native frame size. The value can be an int giving the height of the frame, the height will be automatically calculated by respecting the aspect ratio. With the same effect, it is possible to define only one component, either height or width, and set the other component to -1.
grayscale (bool, optional (default=False)) – Converting video to grayscale.
filters (str, optional (default=None)) – Add a FFmpeg filtergraph, see https://ffmpeg.org/ffmpeg-filters.html.
data_format (str, optional (default="channels_first")) – The ordering of the dimensions of the output tensor video. If “channels_last”, data_format corresponds to output with shape (seq_len, height, width, channels) while “channels_first” corresponds to inputs with shape (seq_len, channels, height, width).
dtype (torch.dtype, optional (default=torch.float)) – Desired output data-type for the tensor, e.g, torch.int16. Can be all torch types except torch.bool and torch.int8.
- Returns
video (torch.Tensor) – Tensor of the form (seq_len, channels, height, width) with seq_len representing the selected number of frames of the video.
frame_rate (int) – The frame rate corresponding to the video.
- Raises
TypeError – [description]
ValueError – [description]
subprocess.CalledProcessError – If the FFmpeg command fail.
avreader.write¶
Write audio, video or both.
|
|
|
Write a torch tensor as a WAV file. |
|
Write a torch tensor as a MP4 file. |
-
avreader.write_audio(fpath, audio, sample_rate, mono=True, filters=None, overwrite=True, codec='pcm_s16le', data_format='channels_first')[source]¶ Write a torch tensor as a WAV file.
- Parameters
fpath (str) – Path to the output file.
audio (torch.Tensor) – A torch tensor containing the audio data.
sample_rate (int) – The audio input sample rate (in samples/sec).
filters (Optional[str], optional (default=None)) – Add a FFmpeg filtergraph, see https://ffmpeg.org/ffmpeg-filters.html.
overwrite (bool, optional (default=True)) – Overwrite output file if it exists.
codec (str, optional (default="pcm_s16le")) – Audio codec to be used to encode the data, see the FFmpeg documentation (https://ffmpeg.org/ffmpeg-codecs.html) for the list of compatible codecs.
data_format (str, optional (default="channels_first")) – The ordering of the dimensions of the input audio. If “channels_last”, data_format corresponds to input tensor with shape (seq_len, channels) while “channels_first” corresponds to input tensor with shape (channels, seq_len).
- Raises
TypeError – [description]
ValueError – [description]
subprocess.CalledProcessError – [description]
-
avreader.write_video(fpath, video, frame_rate, frame_size=None, filters=None, overwrite=True, codec='libx264', data_format='channels_first')[source]¶ Write a torch tensor as a MP4 file.
- Parameters
fpath (str) – Path to the output file.
video (torch.Tensor) – A torch tensor containing the video data
frame_rate (int) – The video input frame rate (in frames/sec).
frame_size (Union[int, Tuple[int, int], None], optional (default=None)) – Target frame size (width, height). If None, frame_size is the native frame size given by the size of the input tensor video. The value can be an int giving the height of the frame, the height will be automatically calculated by respecting the aspect ratio. With the same effect, it is possible to define only one component, either height or width, and set the other component to -1.
filters (Optional[str], optional (default=None)) – Add a FFmpeg filtergraph, see https://ffmpeg.org/ffmpeg-filters.html.
overwrite (bool, optional (default=True)) – Overwrite output file if it exists.
codec (str, optional (default="libx264")) – Video codec to be used to encode the data, see the FFmpeg documentation (https://ffmpeg.org/ffmpeg-codecs.html) for the list of compatible codecs.
data_format (str, optional (default="channels_first")) – The ordering of the dimensions of the input tensor video. If “channels_last”, data_format corresponds to output with shape (seq_len, height, width, channels) while “channels_first” corresponds to inputs with shape (seq_len, channels, height, width).
- Raises
TypeError – [description]
ValueError – [description]
subprocess.CalledProcessError – [description]