How to extract all text as strings from a DWG file?

I just want to extract the block table texts as string. I am drowning in a swamp of custom datatypes about metadata and formatting (desperately seeking standard string datatypes). I just want the text to help make drawing more searchable by database search.

I wish I could just do some Blocktable.ToString() and get the text data as is. I save it into a txt file and everybody is happy.

@esoinila

To extract all text as strings from a DWG file, including block table texts, you can utilize the Aspose.CAD library. Below is a sample code snippet that demonstrates how to achieve this in C#:

using System;
using Aspose.CAD;
using Aspose.CAD.FileFormats.Cad.CadObjects;

namespace Aspose.CAD.Examples
{
    public class ExtractTextFromDWG
    {
        public static void Main()
        {
            // Load the DWG file
            string sourceFilePath = "path_to_your_file.dwg";
            using (CadImage cadImage = (CadImage)Image.Load(sourceFilePath))
            {
                // Iterate through entities to extract text
                foreach (CadEntityBase entity in cadImage.Entities)
                {
                    ExtractText(entity);
                }

                // Iterate through block entities to extract text
                foreach (CadBlockEntity blockEntity in cadImage.BlockEntities.Values)
                {
                    foreach (var entity in blockEntity.Entities)
                    {
                        ExtractText(entity);
                    }
                }
            }
        }

        private static void ExtractText(CadEntityBase entity)
        {
            switch (entity.TypeName)
            {
                case CadEntityTypeName.TEXT:
                    CadText textEntity = (CadText)entity;
                    Console.WriteLine(textEntity.DefaultValue);
                    break;

                case CadEntityTypeName.MTEXT:
                    CadMText mTextEntity = (CadMText)entity;
                    Console.WriteLine(mTextEntity.Text);
                    break;

                case CadEntityTypeName.INSERT:
                    CadInsertObject insertEntity = (CadInsertObject)entity;
                    foreach (var childEntity in insertEntity.ChildObjects)
                    {
                        ExtractText(childEntity);
                    }
                    break;

                case CadEntityTypeName.ATTDEF:
                    CadAttDef attDefEntity = (CadAttDef)entity;
                    Console.WriteLine(attDefEntity.DefaultString);
                    break;

                case CadEntityTypeName.ATTRIB:
                    CadAttrib attribEntity = (CadAttrib)entity;
                    Console.WriteLine(attribEntity.DefaultText);
                    break;
            }
        }
    }
}

In this code, we load the DWG file and iterate through both the main entities and block entities to extract text. The ExtractText method checks the type of each entity and prints the text accordingly. This should help you gather all the text data you need for your database search.

For further reference, you can check out the example code available in the Aspose.CAD GitHub repository here.

If you have any more questions or need further assistance, feel free to ask!

1 Like

Thanks, that compiled and it got the texts out as far as I have checked. Have an excellent weekend.

@esoinila,
Hello,
glad to to hear your are happy with results :slight_smile:
Please consider the additional note about example.
This iterating over all block entities in the drawing could be excess:

foreach (CadBlockEntity blockEntity in cadImage.BlockEntities.Values)
{
    foreach (var entity in blockEntity.Entities)
    {
        ExtractText(entity);
    }
}

Here is the correct way to find respective block entities for CadInsertObjects:

CadInsertObject insert = (CadInsertObject)entity;

CadBlockEntity block = cadImage.BlockEntities[insert.Name];

foreach (CadEntityBase blockEntity in block.Entities)
{
	ExtractText(blockEntity);
}