Get all kind of text in a .dwg

Hi,


I m trying to use ASPOSE.CAD to get any kind of text in a .dwg file.

So far I have been able to get some with the example “DWGDrawing” > “SearchTextInDWGAutoCADFile” but there are still some missing. Especially I d like to get the information written in the text box where you usually find the name of Projet/architect/companies…

It is written in the method “SearchTextInDWGAutoCADFile”:

// Please note that we iterate through CadText entities here, but some other entities
// may contain text also, e.g. CadMText and others

So I’d like to know which are the “others” entities where I could find text. And by the way I don’t get what really means an “entity” in a .dwg file.

I have seen that there is a class named “DgnTextElement” but I dont know how to use it.

Thanks for your help,

JG


Hi,


I have observed your comments. Can you please share the DWG file and also a snapshot highlighting in what you are interested in extracting from DWG file.

Can you please share elaboration for your following comments, so that I may help you further.

“So I’d like to know which are the “others” entities where I could find text. And by the way I don’t get what really means an “entity” in a .dwg file.”

Best Regards,

Hi Muhammad,


Thanks for your help.

1) enclosed in .zip the image and the .dwg. The text data that I’d like is the one on the right side. Especially the text in the pink frame.

2) I am trying to build my own application using the documentation and the following example given with aspose.CAD: "DWGDrawing > searchTextInDWGAutoCADFile"

I am trying to understand how it works but I don’t get how the .dwg is manipulated in the example.

For example in the method “searchTextInDWGAutoCADFile()” (line 29, enclosed in the .txt), we iterate through “CadBaseEntity”, but I dont get what are these entities. In the comments (line 31) it is says that there could be also text in “CadMText and others” entities. So I wonder what are “entities” in a .dwg file?

3) I ve seen in the doc that there is a class named “DgnTextElement”, but I don’t get how to use it to import text from the .dwg.

JG


Hi JG,


Thank you for sharing the requested information.

I have worked with your DWG file and like to share that the text is available in DWG file inside different base entities including CadText, CadMText and CadInsertObject. For CadInsertObject, the text is available inside ATTDEF and ATTRIB CadEntityTypeName enumerators. Moreover, in DWG every entity can have its child entities and so on. So, one need to adopt recursive approach to traverse all base entities and extracting text from the ones supporting text inside them. The attached sample code provides a recursive approach for your DWG file and extracted text is also attached for reference.

For your second query regarding DgnTextElement class. Actually, it belongs to a separate namespace for dealing with DGN file format. It is not applicable in your case with DWG file. However, I have also added the example for loading and working with DGN file as well in attached code file.

I hope the shared information will be helpful.

Best Regards,

Your code seems more complete than mine but you still get the same result than me. In fact you don’t access the text in the pink frame (the one in the layer named “-INEO-CARTOUCHE”) where are the name of the architects etc…


That’s the main information than I’d like to extract from this .dwg. More generally I’d like to be sure that I get all kind of text included in any kind of .dwg file.

So what would it mean about this specific text? That it is not included in an entity or that it is included in another entity than CadText, CadMText etc… ?

Hi JG,


Thanks for sharing the feedback. I have observed the layer mentioned by you in AutoDesk viewer. The selected layer is getting accessed when traversing through entities as some part of entities text as highlighted in this image is getting extracted. We need to further investigate this as all entities holding text are traversed in the sample code. A ticket with ID CADJAVA-130 has been added in our issue tracking system for further investigation. I have linked the ticket with thread so that we may inform you when the feedback will be available.

We are sorry for your inconvenience,

ok great, thanks Muhammad

It seems that the text in the pink frame (CARTOUCHE) is a “CadInsertObject” entity. But I still can’t get the text in it.


I’ve enclosed, the .dwg that isolate the layer that is a problem

Hi JG,


Thank you for sharing the curtailed version of DWG file focusing only on area from which text is not getting extracted. I have updated the information in our issue tracking system and will get back to you as soon as our product team will share the information.

Best Regards,

Hi JG,


Attached please find the working sample code that traverses all entities for extraction of text. The desired text has also been extracted using the attached sample code and attached output is shared for your reference as well. I hope this will be helpful.

Best Regards,

Hi Muhammad,


It works like a charm, thanks again!!

But last questions to be sure I m covering all the text in a file:

1) Can a “CadBlockEntity” be embedded into another “CadBlockEntity”? (The result would be that I would miss some text inside this embedded “CadBlockEntity”, right? )

2) Or can a “CadBlockEntity” be embedded into something else? (like “CadBaseEntity” that is embedded into “CadBlockEntity”)

3) Do you have any idea if text could be hidden in some other part of the file?

Sorry to be that meticulous :slight_smile:


Hi JG,

It’s good to know things have worked on your end.

Actually, CadImage class has getBlockEntities() method, which holds CadBlockDictionary and that further holds Dictionary of CadBlockEntity. Every, CadBlockEntity is block of CadBaseEntity objects and contain its list. Please visit API reference guide for CadBlockEntity addEntity() method, which is responsible for adding Entities (CadBaseEntity) inside CadBlock. Moreover, no CadBlockEntity has further child CadBlockEntity list.

Secondly, CadImage class also hold an array of CadBaseEntity which can be accessed using getEntities() method. Furthermore, every CadBaseEntity object has a list of child CadBaseEntity objects, so it is recursive. So, from this explanation, there is a single CadBlockEntity Dictionary that is only defined on CadImage 1 class level and no CadBaseEntity can hold a CadBlockEntity inside it. I hope this will answer your first and second questions.

As far as your question related to access of any hidden text located inside a DWG file is concerned, we have already shared the sample code for possible entities that can hold text inside them. If you still find any issue of text missing during extraction, please share with us and we will be glad to help you.

Best Regards,

@jg.try,

I like to inform that product team investigated issue in details and i like to share their findings with you. You need to find entities not only from CadImage.Entities but from CadImage.BlockEntities too. Below code has an ability to get a text from sample file3.dwg.

// Load an existing DWG file as CadImage.
CadImage cadImage = (CadImage) Image.Load(“C:\dwg\sample_file3.dwg”);

   // Search for text in the entities section 
   foreach (var entity in cadImage.Entities) { 
	   IterateCADNodes(entity); 
   } 

   // Search for text in the block section 
   foreach (CadBlockEntity blockEntity in cadImage.BlockEntities.Values) { 
	   foreach (var entity in blockEntity.Entities) { 
		   IterateCADNodes(entity); 
	   } 
   } 
} 



private static void IterateCADNodes(CadBaseEntity obj) 
{ 
   switch (obj.TypeName) { 
	   case CadEntityTypeName.TEXT: 
		   CadText childObjectText = (CadText) obj; 

		   Console.WriteLine(childObjectText.DefaultValue); 

		   break; 

	   case CadEntityTypeName.MTEXT: 
		   CadMText childObjectMText = (CadMText) obj; 

		   Console.WriteLine(childObjectMText.Text); 

		   break; 

	   case CadEntityTypeName.INSERT: 
		   CadInsertObject childInsertObject = (CadInsertObject) obj; 

		   foreach (var tempobj in childInsertObject.ChildObjects) { 
			   IterateCADNodes(tempobj); 
		   } 
		   break; 

	   case CadEntityTypeName.ATTDEF: 
		   CadAttDef attDef = (CadAttDef) obj; 

		   Console.WriteLine(attDef.DefaultString); 
		   break; 

	   case CadEntityTypeName.ATTRIB: 
		   CadAttrib attAttrib = (CadAttrib) obj; 

		   Console.WriteLine(attAttrib.DefaultText); 
		   break; 
   } 
}

The issues you have found earlier (filed as ) have been fixed in this Aspose.Words for JasperReports 18.3 update.