Class: Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig
- Inherits:
-
Object
- Object
- Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig
- Extended by:
- Protobuf::MessageExts::ClassMethods
- Includes:
- Protobuf::MessageExts
- Defined in:
- proto_docs/google/cloud/discoveryengine/v1beta/document_processing_config.rb
Overview
A singleton resource of DataStore. If it's empty when DataStore is created and DataStore is set to DataStore.ContentConfig.CONTENT_REQUIRED, the default parser will default to digital parser.
Defined Under Namespace
Classes: ChunkingConfig, ParsingConfig, ParsingConfigOverridesEntry
Instance Attribute Summary collapse
-
#chunking_config ⇒ ::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ChunkingConfig
Whether chunking mode is enabled.
-
#default_parsing_config ⇒ ::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ParsingConfig
Configurations for default Document parser.
-
#name ⇒ ::String
The full resource name of the Document Processing Config.
-
#parsing_config_overrides ⇒ ::Google::Protobuf::Map{::String => ::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ParsingConfig}
Map from file type to override the default parsing configuration based on the file type.
Instance Attribute Details
#chunking_config ⇒ ::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ChunkingConfig
Returns Whether chunking mode is enabled.
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
# File 'proto_docs/google/cloud/discoveryengine/v1beta/document_processing_config.rb', line 61 class DocumentProcessingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods # Configuration for chunking config. # @!attribute [rw] layout_based_chunking_config # @return [::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ChunkingConfig::LayoutBasedChunkingConfig] # Configuration for the layout based chunking. class ChunkingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods # Configuration for the layout based chunking. # @!attribute [rw] chunk_size # @return [::Integer] # The token size limit for each chunk. # # Supported values: 100-500 (inclusive). # Default value: 500. # @!attribute [rw] include_ancestor_headings # @return [::Boolean] # Whether to include appending different levels of headings to chunks # from the middle of the document to prevent context loss. # # Default value: False. class LayoutBasedChunkingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end end # Related configurations applied to a specific type of document parser. # @!attribute [rw] digital_parsing_config # @return [::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ParsingConfig::DigitalParsingConfig] # Configurations applied to digital parser. # @!attribute [rw] ocr_parsing_config # @return [::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ParsingConfig::OcrParsingConfig] # Configurations applied to OCR parser. Currently it only applies to # PDFs. # @!attribute [rw] layout_parsing_config # @return [::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ParsingConfig::LayoutParsingConfig] # Configurations applied to layout parser. class ParsingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods # The digital parsing configurations for documents. class DigitalParsingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end # The OCR parsing configurations for documents. # @!attribute [rw] enhanced_document_elements # @deprecated This field is deprecated and may be removed in the next major version update. # @return [::Array<::String>] # [DEPRECATED] This field is deprecated. To use the additional enhanced # document elements processing, please switch to `layout_parsing_config`. # @!attribute [rw] use_native_text # @return [::Boolean] # If true, will use native text instead of OCR text on pages containing # native text. class OcrParsingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end # The layout parsing configurations for documents. class LayoutParsingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end end # @!attribute [rw] key # @return [::String] # @!attribute [rw] value # @return [::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ParsingConfig] class ParsingConfigOverridesEntry include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end end |
#default_parsing_config ⇒ ::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ParsingConfig
Returns Configurations for default Document parser. If not specified, we will configure it as default DigitalParsingConfig, and the default parsing config will be applied to all file types for Document parsing.
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
# File 'proto_docs/google/cloud/discoveryengine/v1beta/document_processing_config.rb', line 61 class DocumentProcessingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods # Configuration for chunking config. # @!attribute [rw] layout_based_chunking_config # @return [::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ChunkingConfig::LayoutBasedChunkingConfig] # Configuration for the layout based chunking. class ChunkingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods # Configuration for the layout based chunking. # @!attribute [rw] chunk_size # @return [::Integer] # The token size limit for each chunk. # # Supported values: 100-500 (inclusive). # Default value: 500. # @!attribute [rw] include_ancestor_headings # @return [::Boolean] # Whether to include appending different levels of headings to chunks # from the middle of the document to prevent context loss. # # Default value: False. class LayoutBasedChunkingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end end # Related configurations applied to a specific type of document parser. # @!attribute [rw] digital_parsing_config # @return [::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ParsingConfig::DigitalParsingConfig] # Configurations applied to digital parser. # @!attribute [rw] ocr_parsing_config # @return [::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ParsingConfig::OcrParsingConfig] # Configurations applied to OCR parser. Currently it only applies to # PDFs. # @!attribute [rw] layout_parsing_config # @return [::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ParsingConfig::LayoutParsingConfig] # Configurations applied to layout parser. class ParsingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods # The digital parsing configurations for documents. class DigitalParsingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end # The OCR parsing configurations for documents. # @!attribute [rw] enhanced_document_elements # @deprecated This field is deprecated and may be removed in the next major version update. # @return [::Array<::String>] # [DEPRECATED] This field is deprecated. To use the additional enhanced # document elements processing, please switch to `layout_parsing_config`. # @!attribute [rw] use_native_text # @return [::Boolean] # If true, will use native text instead of OCR text on pages containing # native text. class OcrParsingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end # The layout parsing configurations for documents. class LayoutParsingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end end # @!attribute [rw] key # @return [::String] # @!attribute [rw] value # @return [::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ParsingConfig] class ParsingConfigOverridesEntry include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end end |
#name ⇒ ::String
Returns The full resource name of the Document Processing Config.
Format:
projects/*/locations/*/collections/*/dataStores/*/documentProcessingConfig
.
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
# File 'proto_docs/google/cloud/discoveryengine/v1beta/document_processing_config.rb', line 61 class DocumentProcessingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods # Configuration for chunking config. # @!attribute [rw] layout_based_chunking_config # @return [::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ChunkingConfig::LayoutBasedChunkingConfig] # Configuration for the layout based chunking. class ChunkingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods # Configuration for the layout based chunking. # @!attribute [rw] chunk_size # @return [::Integer] # The token size limit for each chunk. # # Supported values: 100-500 (inclusive). # Default value: 500. # @!attribute [rw] include_ancestor_headings # @return [::Boolean] # Whether to include appending different levels of headings to chunks # from the middle of the document to prevent context loss. # # Default value: False. class LayoutBasedChunkingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end end # Related configurations applied to a specific type of document parser. # @!attribute [rw] digital_parsing_config # @return [::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ParsingConfig::DigitalParsingConfig] # Configurations applied to digital parser. # @!attribute [rw] ocr_parsing_config # @return [::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ParsingConfig::OcrParsingConfig] # Configurations applied to OCR parser. Currently it only applies to # PDFs. # @!attribute [rw] layout_parsing_config # @return [::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ParsingConfig::LayoutParsingConfig] # Configurations applied to layout parser. class ParsingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods # The digital parsing configurations for documents. class DigitalParsingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end # The OCR parsing configurations for documents. # @!attribute [rw] enhanced_document_elements # @deprecated This field is deprecated and may be removed in the next major version update. # @return [::Array<::String>] # [DEPRECATED] This field is deprecated. To use the additional enhanced # document elements processing, please switch to `layout_parsing_config`. # @!attribute [rw] use_native_text # @return [::Boolean] # If true, will use native text instead of OCR text on pages containing # native text. class OcrParsingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end # The layout parsing configurations for documents. class LayoutParsingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end end # @!attribute [rw] key # @return [::String] # @!attribute [rw] value # @return [::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ParsingConfig] class ParsingConfigOverridesEntry include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end end |
#parsing_config_overrides ⇒ ::Google::Protobuf::Map{::String => ::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ParsingConfig}
Returns Map from file type to override the default parsing configuration based on the file type. Supported keys:
pdf
: Override parsing config for PDF files, either digital parsing, ocr parsing or layout parsing is supported.html
: Override parsing config for HTML files, only digital parsing and layout parsing are supported.docx
: Override parsing config for DOCX files, only digital parsing and layout parsing are supported.pptx
: Override parsing config for PPTX files, only digital parsing and layout parsing are supported.xlsm
: Override parsing config for XLSM files, only digital parsing and layout parsing are supported.xlsx
: Override parsing config for XLSX files, only digital parsing and layout parsing are supported.
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
# File 'proto_docs/google/cloud/discoveryengine/v1beta/document_processing_config.rb', line 61 class DocumentProcessingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods # Configuration for chunking config. # @!attribute [rw] layout_based_chunking_config # @return [::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ChunkingConfig::LayoutBasedChunkingConfig] # Configuration for the layout based chunking. class ChunkingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods # Configuration for the layout based chunking. # @!attribute [rw] chunk_size # @return [::Integer] # The token size limit for each chunk. # # Supported values: 100-500 (inclusive). # Default value: 500. # @!attribute [rw] include_ancestor_headings # @return [::Boolean] # Whether to include appending different levels of headings to chunks # from the middle of the document to prevent context loss. # # Default value: False. class LayoutBasedChunkingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end end # Related configurations applied to a specific type of document parser. # @!attribute [rw] digital_parsing_config # @return [::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ParsingConfig::DigitalParsingConfig] # Configurations applied to digital parser. # @!attribute [rw] ocr_parsing_config # @return [::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ParsingConfig::OcrParsingConfig] # Configurations applied to OCR parser. Currently it only applies to # PDFs. # @!attribute [rw] layout_parsing_config # @return [::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ParsingConfig::LayoutParsingConfig] # Configurations applied to layout parser. class ParsingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods # The digital parsing configurations for documents. class DigitalParsingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end # The OCR parsing configurations for documents. # @!attribute [rw] enhanced_document_elements # @deprecated This field is deprecated and may be removed in the next major version update. # @return [::Array<::String>] # [DEPRECATED] This field is deprecated. To use the additional enhanced # document elements processing, please switch to `layout_parsing_config`. # @!attribute [rw] use_native_text # @return [::Boolean] # If true, will use native text instead of OCR text on pages containing # native text. class OcrParsingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end # The layout parsing configurations for documents. class LayoutParsingConfig include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end end # @!attribute [rw] key # @return [::String] # @!attribute [rw] value # @return [::Google::Cloud::DiscoveryEngine::V1beta::DocumentProcessingConfig::ParsingConfig] class ParsingConfigOverridesEntry include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end end |