mouthporn.net
#data – @hermajestyschimera on Tumblr
Avatar

Antlers of Iron

@hermajestyschimera / hermajestyschimera.tumblr.com

Illustration, graphic design, fashion, dank memes. Some Toonami liveblogging spam on Saturday nights (US). Occasional posts on politics.
Avatar
Avatar
engineering

Categorizing Posts on Tumblr

Millions of posts are published on Tumblr everyday. Understanding the topical structure of this massive collection of data is a fundamental step to connect users with the content they love, as well as to answer important philosophical questions, such as “cats vs. dogs: who rules on social networks?”

As first step in this direction, we recently developed a post-categorization workflow that aims at associating posts with broad-interest categories, where the list of categories is defined by Tumblr’s on-boarding topics.

Methodology

Posts are heterogeneous in form (video, images, audio, text) and consists of semi-structured data (e.g. a textual post has a title and a body, but the actual textual content is un-structured). Luckily enough, our users do a great job at summarizing the content of their posts with tags. As the distribution below shows, more than 50% of the posts are published with at least one tag.

However, tags define micro-interest segments that are too fine-grained for our goal. Hence, we editorially aggregate tags into semantically coherent topics: our on-boarding categories.

We also compute a score that represents the strength of the affiliation (tag, topic), which is based on approximate string matching and semantic relationships.

Given this input, we can compute a score for each pair (post,topic) as:

where

  • w(f,t) is the score (tag,topic), or zero if the pair (f,t) does not belong in the dictionary W.
  • tag-features(p) contains features extracted from the tags associated to the post: raw tag, “normalized” tag, n-grams.
  • q(f,p) is a weight [0,1] that takes into account the source of the feature (f) in the post (p).

The drawback of this approach is that relies heavily on the dictionary W, which is far from being complete.

To address this issue we exploit another source of data: RelatedTags, an index that provides a list of similar tags by exploiting co-occurence patterns. For each pair (tag,topic) in W, we propagate the affiliation with the topic to its top related tags, smoothing the affiliation score w to reflect the fact these entries (tag,topic) could be noisy.

This computation is followed by filtering phase to remove entries (post,topic) with a low confidence score. Finally, the category with the highest score is associated to the post.

Evaluation

This unsupervised approach to post categorization runs daily on posts created the day before. The next step is to assess the alignment between the predicted category and the most appropriate one.

The results of an editorial evaluation show that the our framework is able to identify in most cases a relevant category, but it also highlights some limitations, such as a limited robustness to polysemy.

We are currently looking into improving the overall performances by exploiting NLP techniques for word embedding and by integrating the extraction and analysis of visual features into the processing pipeline.

Some fun with data

What is the distribution of posts published on Tumblr? Which categories drive more engagements? To analyze these and other questions we analyze the categorized posts over a period of 30 days.

Almost 7% of categorized posts belong to Fashion, with Art as runner up.

The category that drives more engagements is Television, which accounts for over 8% of the reblogs on categorized posts.

However, normalizing by the number of posts published, the category with the highest average of engagements per post isGif Art, followed by Astrology.

Last but not least, here are the stats you all have been waiting for!! Cats are winning on Tumblr… for now…

Avatar

Someone actually fucking did the math for this

assuming shes average height. her boobs appear to be about 1/3 her torso and average torso of a female being 22.6" her boobs are about 7.5" long. a foot is 12 inches. theyre moving at 5,600ft aka 67200 inches a second. her boobs are flopping 8960 times a second.

I didn’t think this could get better, but it did.

8960 flops per second would result in the shockwaves from her breasts emitting an 8960 Hz tone, which is actually a very shrill noise within the range of human hearing. You can enter 8960 into this website to hear an audio sample of what her breast-tone would approximately sound like

YES IT DID GET BETTER

Avatar
Avatar
unionmetrics

When’s the best time to post to Tumblr? 

We’ve been consuming the Tumblr firehose for more than a year at Union Metrics. In that time, we’ve processed more than 40 billion Tumblr posts, reblogs and likes! That’s billion. With a B.

We’re often asked when the best time to post to Tumblr is. So to answer that question, we analyzed more than 6 billion Tumblr activites (posts and notes) from the past two months to figure out when Tumblr is most active, and what that activity looks like over time. 

We’ve found that weekends are the busiest days on Tumblr and Sunday is the most active day overall. Nights are the busiest times, no matter the day of the week. Post activity is at the highest at 4:00 pm EDT; notes peak at 10:00 pm EDT.

The heatmaps show Tumblr post and note activity; each square shows the intensity of activity during that day and hour. The darker the color, the busier that hour is. 

So, what can you do with this info? Well, Tumblr is more active at night, so you’re more likely to have a bigger, more engaged audience then. Particularly later at night, when the highest reblog and like activity happens. But if you want less competition, there’s not much going on in the mornings, so you could give that a shot. If you’re a brand or business, you should consider scheduling content outside standard US business hours, especially on the weekends when more people are spending more time on Tumblr. This is particularly important, because compared to other social networks like Twitter and Facebook, Tumblr is much more active on the weekends. 

Want to see what works for your blog? Sign up to get on the waitlist for our new Union Metrics for Tumblr account!

Avatar
thefrogman

I’ve been interested in this information for quite some time. For those of you who create things, I would definitely recommend posting your prime content during that 4pm to 10pm window. So often I see someone post a comic at a million o’clock in the morning and it makes me wince. Making great content is important, but using some strategy to give it the best chance at exposure is not something to be ignored. 

You are using an unsupported browser and things might not work as intended. Please make sure you're using the latest version of Chrome, Firefox, Safari, or Edge.
mouthporn.net