How to extract text between HTML or XML tags?

There are certain types of data that we want to collect from HTML or XML,
for handling them, or better display.
There are ways to do this with TWebBrowser, but it forces us to have
to load the file in it and the manipulation is not
very easy.
But let’s say we have HTML or XML in a variable, or even
retrieving it through a Get with Indy (IdHTTP) with Delphi,
how can we do the data mining?
Fortunately there is a very simple function that will help us!
Let’s go to the source code:
function ExtractText(aText, OpenTag, CloseTag : String) : String; { Retorna o texto dentro de 2 tags (open & close Tag's) } var iAux, kAux : Integer; begin Result := ''; if (Pos(CloseTag, aText) <> 0) and (Pos(OpenTag, aText) <> 0) then begin iAux := Pos(OpenTag, aText) + Length(OpenTag); kAux := Pos(CloseTag, aText); Result := Copy(aText, iAux, kAux-iAux); end; end;
Example of use:
procedure TForm1.Button1Click(Sender: TObject); const HTML = '<html>'+ '<head>'+ '<title>SHOW DELPHI</title>'+ '</head>'+ '<body>'+ '<h1>Titulo 1</h1>'+ '<h2>Titulo 2</h2>'+ '</body>'+ '</html>'; var variavelString : string; begin variavelString := ExtractText(HTML,'<h1>', '</h1>'); ShowMessage( variavelString ); end;
I hope it will be useful to everyone!