Tuesday, April 7, 2009

how to convert web page to pdf




In this post i will show how to convert web page to pdf using iTextSharp.

<%@ Page Language="C#" AutoEventWireup="true" CodeFile="Pdf.aspx.cs" Inherits="Pdf" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
    <title></title>
</head>
<body>
    <form id="form1" runat="server">
    <asp:PlaceHolder ID="PlaceholderPdf" runat="server"></asp:PlaceHolder>
    <div>
        <table border="1">
            <tr>
                <td colspan="2">
                    aspdotnetcodebook
                </td>
            </tr>
            <tr>
                <td>
                    cell1
                </td>
                <td>
                    cell2
                </td>
            </tr>
            <tr>
                <td colspan="2">
                    <asp:Label ID="lblLabel" runat="server" Text="Label Test"></asp:Label>
                </td>
            </tr>
        </table>
    </div>
    </form>
</body>
</html>
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.IO;
using System.Text.RegularExpressions;
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.html;
using iTextSharp.text.xml;
using System.Xml;
using iTextSharp.text.html.simpleparser;
public partial class Pdf : System.Web.UI.Page
{
    protected override void Render(HtmlTextWriter writer)
    {
        MemoryStream mem = new MemoryStream();
        StreamWriter twr = new StreamWriter(mem);
        HtmlTextWriter myWriter = new HtmlTextWriter(twr);
        base.Render(myWriter);
        myWriter.Flush();
        myWriter.Dispose();
        StreamReader strmRdr = new StreamReader(mem);
        strmRdr.BaseStream.Position = 0;
        string pageContent = strmRdr.ReadToEnd();
        strmRdr.Dispose();
        mem.Dispose();
        writer.Write(pageContent);
        CreatePDFDocument(pageContent);


    }
    public  void CreatePDFDocument(string strHtml)
    {

        string strFileName = HttpContext.Current.Server.MapPath("test.pdf");
        // step 1: creation of a document-object
        Document document = new Document();
        // step 2:
        // we create a writer that listens to the document
        PdfWriter.GetInstance(document, new FileStream(strFileName, FileMode.Create));
        StringReader se = new StringReader(strHtml);
        HTMLWorker obj = new HTMLWorker(document);
        document.Open();
        obj.Parse(se);
        document.Close();
        ShowPdf(strFileName);
     
   
     
    }
    public void ShowPdf(string strFileName)
    {
        Response.ClearContent();
        Response.ClearHeaders();
        Response.AddHeader("Content-Disposition", "inline;filename=" + strFileName);
        Response.ContentType = "application/pdf";
        Response.WriteFile(strFileName);
        Response.Flush();
        Response.Clear();
    }
}

40 comments:

Priya Dev said...

Very good job....It helped me a lot..

Anonymous said...

To simply and fast create pdf file use .rdlc file in VStudio. There is a WYSWIG that help you do it :)

Anonymous said...

Hi I am getting following error
Any Ideas


The network path was not found.

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

Exception Details: System.IO.IOException: The network path was not found.


Source Error:


Line 52: HTMLWorker obj = new HTMLWorker(document);
Line 53: document.Open();
Line 54: obj.Parse(se);
Line 55: document.Close();
Line 56: ShowPdf(strFileName);

martijnramesdonk1 said...

This piece of code is not working verry well tabels are always 100% width and images are not in position.

somebody any ideas?

Anonymous said...

Even I am getting a message and a piece of code is not working. It sstopped at obj.Parse(se);
and says Could not find a part of the path 'C:\Images\logo.JPG'.

Please help!!!!

Anonymous said...

Even I am getting a message and a piece of code is not working. It sstopped at obj.Parse(se);
and says Could not find a part of the path 'C:\Images\logo.JPG'.

Please help!!!!

santosh said...

HI
Please debug and send me exact error message

santosh said...

HI
Please debug and send me exact error message

Raj said...

I am rajesh m Somvanshi
I Use iTextSharp.dll for Convert ASPX page in PDF Formate That is Done Successfully But My CSS is Not working in PDF File

Please Help Me....

santosh said...

Hi Rajesh,
If you want to maintain css then you may create table structure manually.For more details check out this link
http://forums.asp.net/t/1433490.aspx

GyanPrakash said...

Hi all
If page contain any image control or file then How would we render it to PDF?

santosh said...

Hi GyanPrakash,
Have u tried my code?

GyanPrakash said...

yes i have tried , it works only simple pages
if page contain any images ,it couldnt work.

santosh said...

Ok I will look into this and get back to u soon.

Anonymous said...

hi Santhosh,

Iam getting the following error while runing this sample:
"Access to the path 'C:\' is denied"
The above error raised at "CreatePDFDocument()" method at line "obj.Parse(se); "

Plz help me as soon as posible. its very urgent for me. Any will be highly appriciatable.

Thanks,
kumar

santosh said...

Hi Kumar
From your error message it is clear that the asp.net user does not have access write to c drive.
Do one thing create a temp folder in c drive and write the newly created pdf to that folder.

string strFileName = @"C:\temp\test.pdf"

Som said...

Hi Santosh
Could you plese suggest me what will be the variable to be used in render method.

Tom said...

I use online service http://www.web2pdfconvert.com/pdf-api.aspx
Works great, I just pass URL of web page which I want to convert using REST and get PDF as stream back. But this solution only works if web pages are accessible from online.

jay said...

HI Actually your code is very helpfull,But it
doesn`t work with images .
it gives error in
obj.Parse(se);
so if there is solution then please tell me .

Anonymous said...

Hi,
is very good,
but one problem itextsharp no found with style,

for example

width:500px

Anonymous said...

HI Actually your code is very helpfull,But it
doesn`t work with images .
it gives error in
obj.Parse(se);
so if there is solution then please tell me .

Kaustubh said...

If u want image u need to write server path of image in img tag like,

src="http://localhost:1060/Project/images/image1.jpg" alt="ImageName"

cheers!!!

vikas said...

i m getting error on this line
obj.Parse(se);

error is
1.The best overloaded method match for 'iTextSharp.text.html.simpleparser.HTMLWorker.Parse(System.IO.StreamReader)' has some invalid arguments E:\vikas\reporteg1\WebApplication7\Default.aspx.cs 232 14 WebApplication7
2.Argument '1': cannot convert from 'System.IO.StringReader' to 'System.IO.StreamReader' E:\vikas\reporteg1\WebApplication7\Default.aspx.cs 232 24 WebApplication7


can u just tell me what is wrong

vinaykkk said...

Hi,

I m Vinay. I am very much thank you for your Code.

Izhaar said...

Basic html for styling like bgcolor, width ,border=0.5 ,align , font color , size

For image

iTextSharp.text.Image gif = iTextSharp.text.Image.GetInstance((Server.MapPath("/") + "123451.jpg"));

document.Add(gif);

anki said...

hi i m using ur code in placeholder1 which contain a table.i got this error please help me


System.InvalidCastException: Unable to cast object of type 'iTextSharp.text.html.simpleparser.CellWrapper' to type 'iTextSharp.text.Rectangle'.

Anonymous said...

Hi i am using ur code in placeholder which contain 5 or 6 big images i got error

Anonymous said...

Hi,
I'm using the above code.There is image in my page.While executing,it's giving error as "Input string is not in correct format" on line

obj.Parse(se);

can anybody help me to resolve this?

Unknown said...

hi..if there are merged cells in grid view then merging disappears in pdf

nishant chandwani said...

i got a itext error ... would u pleas how to solves this name spaces error ..

santosh said...

Have you downloaded itextsharp lib

vasundhara said...

hi,i have uploaded the word document using the code provided by your blog


public partial class Default5 : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{

}
protected void btnRead_Click(object sender, EventArgs e)
{

ApplicationClass wordApp = new ApplicationClass();



string filePath = FileUpload1.PostedFile.FileName;

object file = filePath;

object nullobj = System.Reflection.Missing.Value;



Document doc = wordApp.Documents.Open(ref file,
ref nullobj,
ref nullobj,
ref nullobj,
ref nullobj,
ref nullobj,
ref nullobj,
ref nullobj,
ref nullobj,
ref nullobj,
ref nullobj,
ref nullobj,
ref nullobj,
ref nullobj,
ref nullobj,
ref nullobj);


Document doc1 = wordApp.ActiveDocument;

string m_Content = doc1.Content.Text;



TextBox1.Text = m_Content;

doc.Close(ref nullobj, ref nullobj, ref nullobj);
}
}



it is working fine,i have a requirement that is,after uploading the word document if i make any modifications in that it should be saved,how can do this?

Anish said...

hi , can you let know off how to extract a specific named div . I want to extract export only a part of the page . Thanks for the help

Anonymous said...

Hi, thanks for the code. Can you tell me how you are calling the "Render" method, though? Thanks.

Balaji said...

convert aspx to pdf. using css design not appear in pdf

Anonymous said...

can u please tell me that what's the use of Render method here...?

vipulmuralidhar said...

thnxxxx

Anonymous said...

Hi Can you please help me i am getting error as Access to the path 'C:\' is denied. at

htmlparser.Parse(sr);

below is my code .Please help its urgent

Response.ContentType = "application/pdf";
Response.AddHeader("content-disposition", "attachment;filename=" + filename + ".pdf");
Response.Cache.SetCacheability(HttpCacheability.NoCache);
ExportText = FormatText(ExportText);
StringReader sr = new StringReader(ExportText);
Document pdfDoc = new Document(PageSize.A4, 20, 20, 50, 40);
HTMLWorker htmlparser = new HTMLWorker(pdfDoc);
PdfWriter.GetInstance(pdfDoc, Response.OutputStream);
pdfDoc.Open();
htmlparser.Parse(sr);
pdfDoc.Close();
Response.Write(pdfDoc);
Response.End();

Shivani Gupta said...

Good blog.....It help me out thanks again:)

pal said...

i have link in gridview whern i click on pdf_click button then error arise in vb.net coade
Unable to cast object of type 'iTextSharp.text.html.simpleparser.CellWrapper' to type 'iTextSharp.text.Paragraph'.
how can i do

Post a Comment