When last semester’s started, I felt like I wasn’t enjoying my time in college much. As a Computer Science student, I was very concerned that I never really tried building something by my own. Instead, I was even eager to be in Twitter, scrolling down to news, and criticizing almost everything.
But then I thought:
couldn’t I be eager both on what I like and what I studied to?
After questioning myself, I decided to try on creating a Twitter Bot that mix what I am passionate in:
2. Journalistic, and
3. Python Programming.
I came across a really good bot once. Working as a bot that replied to a news account and summarize the news. Here’s the illustration of how the bot worked.
I found it very interesting because simply it:
1. avoid you from clicking on a click-bait news and
2. avoid misinformation from an unclear headline.
I was once very excited about the bot but too bad it’s not activated anymore. I tried to contact the creator about the account and looked for his GitHub if just in case he ever published the source code but I found none.
So! I tried to recreate the bot with my very own approach.
*please note that I am a very messy learner with a chaotic mindset; at least it works.
I learned and collected few How-To-s that I should prepare before jumping in:
1. Connects to Twitter API to take the Link from Tweets
2. Retrieves news article content from the Link with scrapping tools and restore it to JSON format
3. Summarizes the data
4. Converts the summarized article in JSON to an image.
5. Reply the image to the user.
So after that I collected every functions for those (yes, this ineffective) and put the puzzles together.
Here, I elaborate every steps:
This is very easy, I discovered. First thing first, I need to make sure my Twitter account applied to Twitter Developer account to get token and access to the API.
Once the appliance to Twitter Developer is accepted, I can get my keys and token, and im good to proceed.
Accessing Twitter API with Python is simple. Simply import Tweepy to the program and create a new Python file that restore the key from Twitter.
On the separated files, put the Consumer Key, Consumer Secret Key, Access Key, and Access Secret Key from Twitter Developer.
On the main file, import Tweepy and the constant file and access the authentication.
With this, I can already use every feature in the Tweepy modules! Next!
After accessing Tweepy, I proceed to discover how to get the link from a tweet that mention to the account.
I put the method in a separated function (sounds weird but you got me) and the method will take the mentions. By the rules of Tweepy, the api.mentions_timeline() will take 20 mentions but since I only need the latest one, I filter it by restore the ones I already used an take the other one after that (the id is created automatically in order, descending).
After that, I restore it to a separate file which I named last_seen_id.txt and will be used as the reference to the next latest id mentions.
After getting the tweets with certain ID I need, I need to make sure that the tweets I received is only contained with link. So I count on Regex for that.
Voila, here’s the code.
So the one link’s that will be proceed will go to the next step:
I tried several scrapping tools for this and decided that Scrappy is my favorite (this one works first). I don’t really understand how I finally get the function done (oh well) but here’s the code:
I really filter the content so recklessly and ineffective but anyway.
This step scrapped the content from the title, the body, and remove tags that’s not needed to the summarization. I took it based on their div name from the HTML. The reason why I limited the sources was because it’s easier to set the section from the HTML since different source means different HTML structure.
All of it restored in a list so in the variable content I put the needed and unneeded things and remove the unneeded using for loops.
The content I need is dumped to a JSON file.
This is the most crucial and hardest part I guess, so after this, I got pretty eased. The remained steps were fun!
After got the data from all the scrapping things and restore it to JSON, all I have to do to summarize it (surprisingly) just import a library called Gensim and apply the summarization function to the text.
It works pretty good and satisfying. The text is in Indonesian but even if you don’t understand it I can guarantee it worked well enough.
And the code was just
This one too is actually pretty easy. After the text is already set up, I just need a background picture to put it on. The library I use for this is: Pillow.
Here’s how it looked .
As the text was pretty long, I used Text Wrap library to wrap the text nicely. The picture will be save to a new jpeg file and I apply that picture to the function tweetit.
Here’s to elaborate the code:
I’m using another function from Tweepy to reply directly from the account that mention me first. The id of the tweet taken from the latest ID I put on the last_seen_id.txt. And then it sends a confirmation that its already been sent.
So voila that was it. I also used twisted.reactor for the looping so the Scrappy can run in loop by itself. Although I think it’s still not really effective and really heavy and the program gives up too easy after some looping (like me) but yeah.
It’s really simple yet really amused me for some reasons. I bet my explanation seems so off and a bit stupid as my codes. But here’s my full code on my GitHub.
I hope this simply encouraging for a beginner like me to start and making programs with Python cause it’s quite simple. And it really felt as if I’m creating an art.
Thank you for reading!