C# Itextsharp Create Pdf From Html
How can I read PDF content with the itextsharp with the Pdfreader class. Feet Movie. My PDF may include Plain text or Images of the text. Here Mudassar Ahmed Khan has explained how to export ASP. Net Panel control which is rendered as HTML DIV to PDF Portable Document Format Document using iTextSharp. Sample code snippet for converting HTML to PDF format. It also considers image tags in the HTML and uses iTextSharp library. Comments on this post Using iTextSharp to correctly display Hebrew Arabic text Right to Left in a PDF Document. In this blog we will learn how to create tables in PDF using C and iTextSharp. How to convert HTML to PDF using i. Text. Sharp. First, HTML and PDF are not related although they were created around the same time. HTML is intended to convey higher level information such as paragraphs and tables. Although there are methods to control it, it is ultimately up to the browser to draw these higher level concepts. PDF is intended to convey documents and the documents must look the same wherever they are rendered. In an HTML document you might have a paragraph thats 1. Adobe Photoshop 0.7 Brushes. A PDF file, however, must be independent of the rendering device, so regardless of your screen size it must always render exactly the same. Because of the musts above, PDF doesnt support abstract things like tables or paragraphs. There are three basic things that PDF supports text, linesshapes and images. Download iTextSharp, a. NET PDF library for free. Text is a PDF library that allows you to CREATE, ADAPT, INSPECT and MAINTAIN documents in the Portable. There are other things like annotations and movies but Im trying to keep it simple here. In a PDF you dont say heres a paragraph, browser do your thing. Instead you say, draw this text at this exact X,Y location using this exact font and dont worry, Ive previously calculated the width of the text so I know it will all fit on this line. You also dont say heres a table but instead you say draw this text at this exact location and then draw a rectangle at this other exact location that Ive previously calculated so I know it will appear to be around the text. Second, i. Text and i. Text. Sharp parse HTML and CSS. Thats it. ASP. Net, MVC, Razor, Struts, Spring, etc, are all HTML frameworks but i. Texti. Text. Sharp is 1. Same with Data. Grid. C# Itextsharp Create Pdf From Html' title='C# Itextsharp Create Pdf From Html' />Views, Repeaters, Templates, Views, etc. It is your responsibility to get the HTML from your choice of framework, i. Text wont help you. If you get an exception saying The document has no pages or you think that i. Text isnt parsing my HTML it is almost definite that you dont actuallyhave HTML, you only think you do. Third, the built in class thats been around for years is the HTMLWorker however this has been replaced with XMLWorker Java. Net. Zero work is being done on HTMLWorker which doesnt support CSS files and has only limited support for the most basic CSS properties and actually breaks on certain tags. If you do not see the HTML attribute or CSS property and value in this file then it probably isnt supported by HTMLWorker. XMLWorker can be more complicated sometimes but those complications also make itmoreextensible. Below is C code that shows how to parse HTML tags into i. Text abstractions that get automatically added to the document that you are working on. C and Java are very similar so it should be relatively easy to convert this. Example 1 uses the built in HTMLWorker to parse the HTML string. Since only inline styles are supported the classheadline gets ignored but everything else should actually work. Example 2 is the same as the first except it uses XMLWorker instead. Example 3 also parses the simple CSS example. Create a byte array that will eventually hold our final PDF. Boilerplate i. Text. Sharp setup here. Create a stream that we can write to, in this case a Memory. Stream. using var ms new Memory. Stream. Create an i. Text. Sharp Document which is an abstraction of a PDF but OT a PDF. Document. Create a writer thats bound to our PDF abstraction and our stream. Pdf. Writer. Get. Instancedoc, ms. Open the document for writing. Open. Our sample HTML and CSS. This lt em is lt em lt span classheadline styletext decoration underline somelt span lt strong sample lt em textlt em lt strong lt span stylecolor red lt span lt p. Example 1 Use the built in HTMLWorker to parse the HTML. Only inline CSS is supported. Create a new HTMLWorker bound to our document. Worker new i. Text. Sharp. text. html. HTMLWorkerdoc. HTMLWorker doesnt read a string directly but instead needs a Text. Reader which String. Reader subclasses. String. Readerexamplehtml. Parse the HTML. html. Worker. Parsesr. Example 2 Use the XMLWorker to parse the HTML. Only inline CSS and absolutely linked CSS is supported XMLWorker also reads from a Text. Reader and not directly from a string. Html new String. Readerexamplehtml. Parse the HTML. i. Text. Sharp. tool. XMLWorker. Helper. Get. Instance. Parse. XHtmlwriter, doc, sr. Html. Example 3 Use the XMLWorker to parse HTML and CSS In order to read CSS as a string we need to switch to a different constructor. Streams instead of Text. Readers. Below we convert the strings into UTF8 byte array and wrap those in Memory. Streams. using var ms. Css new Memory. StreamSystem. Text. Encoding. UTF8. Get. Bytesexamplecss. Html new Memory. StreamSystem. Text. Encoding. UTF8. Get. Bytesexamplehtml. Parse the HTML. i. Text. Sharp. tool. XMLWorker. Helper. Get. Instance. Parse. XHtmlwriter, doc, ms. Html, ms. Css. Close. After all of the PDF stuff above is done and closed but before we. Memory. Stream, grab all of the active bytes from the stream. To. Array. Now we just need to do something with those bytes. Here Im writing them to disk but if you were in ASP. Net you might Response. Binary. Write them. You could also write the bytes to a database in a varbinary column but please dont or you. PDF processing. var test. File Path. CombineEnvironment. Get. Folder. PathEnvironment. Special. Folder. Desktop, test. System. IO. File. Write. All. Bytestest. File, bytes. There are good news for HTML to PDF demands. As this answer showed, the W3. C standard css break 3 will solve the problem. It is a Candidate Recommendation with plan to turn into definitive Recommendation this year, after tests. As not so standard there are solutions, with plugins for C, as showed by print css.