Remove html tags using c#
I have a set of data as web pages in .txt format. I want to remove html tags from it.
Write simple line of codes in any high level programming language C# or C++ for me.
Thanks in advance.
I have a set of data as web pages in .txt format. I want to remove html tags from it.
Write simple line of codes in any high level programming language C# or C++ for me.
Thanks in advance.
Hi
The way I have understand your question is that you want to remove the tags of html in .txt file. You should follow the following subjects
1. First open that file and remove all the <> this form of text
2. 2nd is remove the value between the tags other than in the body tag
3. You will remove the tags below the body. This is all what should you. So take these simple steps and these will help you sorting your problem. As you said you want the C++ code so for removing the tags there is no such code for this. So you have to do this manually. This is the solution of the problem you are asking hope this solution will help you solving your problem
You can easily remove the HTML tags of web search results by using c#. You are easily able to learn C# or C++ using msdn i.e. actually a guideline for visual studio platform. Follow this link:
Above links will help out to learn about the function of regular expression (i.e. regex) is used to remove the html tags, xml tags as well as nodes between xml tags and also normal tags from a web results. And then return filter texts from txt files. There is some code that I want to refer you, it will work.
class Program
{
static void Main(string[] args)
{
; string str = @"C:UserskiranDesktopapplePage0.txt";//save the txt file path in str
Â
string readText = File.ReadAllText(str); //read the text file and save in string
Â
StringBuilder sb = new StringBuilder();//create object of string builder
sb.Insert(0, HtmlStrip(readText));//insert the read text file in the object of string builder
Console.WriteLine(sb.ToString());//print the removable tag files
Console.ReadLine();
}
public static string HtmlStrip(string readText)
{
readText = Regex.Replace(input, "<style>(.|n)*?</style>", string.Empty);
//this line of code use to remove style sheet tags.
readText = Regex.Replace(input, @"<xml>(.|n)*?</xml>", string.Empty);
//this line of code use to remove all xml tags and nodes between xmls.
return Regex.Replace(input, @"<(.|n)*?>", string.Empty);
//this line of code use to remove all tags like </br> and so on.
}
Apply this code on visual c# platform. compile this code. Hope the given output fulfills your requirement.