<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML xmlns="http://www.w3.org/TR/REC-html40" xmlns:o =
"urn:schemas-microsoft-com:office:office" xmlns:w =
"urn:schemas-microsoft-com:office:word"><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.6000.16544" name=GENERATOR>
<STYLE>@page Section1 {size: 8.5in 11.0in; margin: 1.0in 1.25in 1.0in 1.25in; }
P.MsoNormal {
FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman"
}
LI.MsoNormal {
FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman"
}
DIV.MsoNormal {
FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman"
}
A:link {
COLOR: blue; TEXT-DECORATION: underline
}
SPAN.MsoHyperlink {
COLOR: blue; TEXT-DECORATION: underline
}
A:visited {
COLOR: purple; TEXT-DECORATION: underline
}
SPAN.MsoHyperlinkFollowed {
COLOR: purple; TEXT-DECORATION: underline
}
SPAN.EmailStyle17 {
COLOR: windowtext; FONT-FAMILY: Arial; mso-style-type: personal-compose
}
DIV.Section1 {
page: Section1
}
</STYLE>
</HEAD>
<BODY lang=EN-US vLink=purple link=blue>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff size=2><SPAN
class=328471017-23102007>I pipe my alerts to a perl script</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff size=2><SPAN
class=328471017-23102007></SPAN></FONT> </DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff size=2><SPAN
class=328471017-23102007>below is the stripping html portion - the main message
is $body here :</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff
size=2></FONT> </DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff
size=2></FONT> </DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff size=2># right below here
i strip out html</FONT></DIV>
<DIV> </DIV><FONT face=Arial color=#0000ff size=2>
<DIV dir=ltr align=left><BR>$body =~ s{
<! # comments begin
with a
`<!'<BR>
# followed by 0 or more comments;</DIV>
<DIV> </DIV>
<DIV dir=ltr align=left>
(.*?)
# this is actually to eat up comments in non
<BR>
# random places</DIV>
<DIV> </DIV>
<DIV dir=ltr align=left>
(
# not suppose to have any white space here</DIV>
<DIV> </DIV>
<DIV dir=ltr
align=left>
# just a quick start; <BR>
--
# each comment starts with a `--'<BR>
.*? #
and includes all text up to and including<BR>
--
# the *next* occurrence of `--'<BR>
\s* #
and may have trailing while
space<BR>
# (albeit not leading white space XXX)<BR>
)+
# repetire ad libitum XXX should be * not +<BR>
(.*?)
# trailing non comment text<BR>
>
# up to a `>'<BR>}{<BR> if ($1 || $3)
{ # this silliness for embedded comments in
tags<BR> "<!$1
$3>";<BR> }
<BR>}gesx;
# mutate into nada, nothing, and niente</DIV>
<DIV> </DIV>
<DIV dir=ltr align=left>$body =~ s{
<
# opening angle bracket</DIV>
<DIV> </DIV>
<DIV dir=ltr align=left>
(?:
# Non-backreffing grouping
paren<BR> [^>'"]
* # 0 or more things that are neither >
nor ' nor
"<BR>
|
# or else<BR>
".*?" # a section between
double quotes (stingy
match)<BR>
|
# or else<BR>
'.*?' # a section between
single quotes (stingy match)<BR> )
+
# repetire ad
libitum<BR>
# hm.... are null tags <> legal? XXX<BR>
>
# closing angle
bracket<BR>}{}gsx;
# mutate into nada, nothing, and niente</DIV>
<DIV> </DIV>
<DIV dir=ltr align=left>$body =~ s{
(<BR>
&
# an entity starts with a
semicolon<BR> (
<BR>
\x23\d+ # and is either a pound (#) and
numbers<BR>
| # or
else<BR>
\w+ # has alphanumunders up to a
semi<BR>
)
<BR>
;? # a
semi terminates AS DOES ANYTHING ELSE (XXX)<BR> )<BR>} {</DIV>
<DIV> </DIV>
<DIV dir=ltr align=left>
$entity{$2} # if it's a known entity
use that<BR>
||
# but otherwise<BR>
$1 #
leave what we'd found; NO WARNINGS (XXX)</DIV>
<DIV> </DIV>
<DIV dir=ltr
align=left>}gex;
# execute replacement -- that's code not a string<BR></FONT></DIV><BR>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> James Wade
[mailto:jkwade@futurefrontiers.com] <BR><B>Sent:</B> Tuesday, October 23, 2007
12:34 PM<BR><B>To:</B> hobbit@hswn.dk<BR><B>Subject:</B> [hobbit] Paging -- HTML
to Text<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV class=Section1>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">Can anyone point me to a good HTML
to Text Converter?<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><o:p> </o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">I’m sending out pages and the
incoming message is HTML,<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">but I want to send it out as text. I
want to take all the HTML<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">imbedded in it out. I’m looking on
the web, but I can’t seem<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">to find anything that will allow me
to pipe it through the command.<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><o:p> </o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">Thanks…James<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><o:p> </o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><o:p> </o:p></SPAN></FONT></P>
<P class=MsoNormal
style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"><FONT
face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"><o:p> </o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"><o:p> </o:p></SPAN></FONT></P></DIV></BODY></HTML>