Removing any and all inline styles from the_content()

For one of my current projects, I had to transfer blogposts from an old WordPress site to my project.

Things went smoothly until I’ve seen that all the posts were copy pasted from Word, leaving this before pretty much every paragraph:

<span style="font-size: medium; font-family: georgia,palatino;">

And at some places things like these:

<p style="text-align: justify;">
<p style="text-align: justify;"><span style="font-size: medium; font-family: georgia,palatino;"><strong><span style="color: #000000;">

So because I don’t have the 40 hours (even less the patience) to just go into every post (there’s about 100) and remove those unwanted tags, I’m looking for a filter that would just remove all style (except maybe if it contains text-decoration:underline) elements before outputting the_content()

Is there such a thing?

Here is Solutions:

We have many solutions to this problem, But we recommend you to use the first solution because it is tested & true solution that will 100% work for you.

Solution 1

If we want to remove all inline styles, then just simply need to add the following code in functions.php.

add_filter('the_content', function( $content ){
    //--Remove all inline styles--
    $content = preg_replace('/ style=("|\')(.*?)("|\')/','',$content);
    return $content;
}, 20);

Solution 2

Just add this to your functions.php.

Note: This filter works at the time of saving/updating the post.

add_filter( 'wp_insert_post_data' , 'filter_post_data' , '99', 2 );

function filter_post_data( $data , $postarr ) {

    $content = $data['post_content'];

    $content = preg_replace('#<p.*?>(.*?)</p>#i', '<p>\1</p>', $content);
    $content = preg_replace('#<span.*?>(.*?)</span>#i', '<span>\1</span>', $content);
    $content = preg_replace('#<ol.*?>(.*?)</ol>#i', '<ol>\1</ol>', $content);
    $content = preg_replace('#<ul.*?>(.*?)</ul>#i', '<ul>\1</ul>', $content);
    $content = preg_replace('#<li.*?>(.*?)</li>#i', '<li>\1</li>', $content);

    $data['post_content'] = $content;

    return $data;
}

Note: This filter works at the time when function the_content() is executed.

add_filter( 'the_content', 'the_content_filter', 20 );

function the_content_filter( $content ) {
    $content = preg_replace('#<p.*?>(.*?)</p>#i', '<p>\1</p>', $content);
    $content = preg_replace('#<span.*?>(.*?)</span>#i', '<span>\1</span>', $content);
    $content = preg_replace('#<ol.*?>(.*?)</ol>#i', '<ol>\1</ol>', $content);
    $content = preg_replace('#<ul.*?>(.*?)</ul>#i', '<ul>\1</ul>', $content);
    $content = preg_replace('#<li.*?>(.*?)</li>#i', '<li>\1</li>', $content);
    return $content;
}

Solution 3

I tried the method above with the saving/updating but didn’t worked for me so I went from another approach. I exported the whole wp_posts table, opened it in Sublime and did a regex replace. I used style="*.*?" to find all cases and replaced them with emptyness. Then droped the old table’s content and imported the new one.

If any one try this method – please make sure you have a clear back up in case there are some other post types in the wp_post table and the things got bit messy.

Solution 4

I would check out the content_save_pre filter, and probably apply some fancy regex at that point.

Note: Use and implement solution 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply